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Preface 



The Spanish Association for Artificial Intelligence (AEPIA) was founded in 1983 
aiming to encourage the development of artificial intelligence in Spain. AEPIA 
is a member of the ECCAI (European Co-ordinating Committee for Artificial 
Intelligence) and a founder member of IBERAMIA, the Iberoamerican Confer- 
ence on Artificial Intelligence. Under the succesive presidencies of Jose Cuena, 
Francisco Garijo and Federico Barber, the association grew to its present healthy 
state. Since 1985, AEPIA has held a conference (CAEPIA) every second year. 
Since 1995 a Workshop on Technology Transfer of Artificial Intelligence (TTIA) 
has taken place together with CAEPIA. 

The CAEPIA-TTIA conferences were traditionally held mostly in Spanish 
and the proceedings were also published in our language. However, in order 
to promote an even more fruitful exchange of experiences with the international 
scientific community, the decision was made to publish a postproceedings English 
volume with the best contributions to CAEPIA-TTIA 2003. 

In fact, 214 papers were submitted from 19 countries and 137 were presented 
at the conference. From these, 66 were selected and were published in this book; 
that also includes an invited talk paper. We must express our gratitude to all 
the authors who submitted their papers to our conference. 

The papers were reviewed by an international committee formed by 80 mem- 
bers from 13 countries. Each paper was reviewed by two or three referees, with 
an average of 2.7 reviews per paper. The papers included in this volume were 
submitted to a second review process. We must also express our deepest grat- 
itude to all the researchers whose invaluable contributions guaranteed a high 
scientific level for the conference. 

The conference was organized by the GALAN group of the Department of 
Computer Science, University of the Basque Country (EHU-UPV). We must 
mention that the Organizing Committee provided a really charming environment 
where researchers and practitioners could concentrate on the task of presenting 
and commenting on contributions. 

The conference was supported by the liberal sponsorship of the Spanish Min- 
isterio de Ciencia y Tecnologia, the Basque Government, the University of the 
Basque Country (Vicerrectorado del Campus de Gipuzkoa) and the Caja de 
Alrorros de Guipuzcoa-Kutxa. We thank them for the generous funding that 
allowed the presence of invited speakers and granted students. 
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Abstract. Artificial Intelligence (AI) technology has been extended for use in 
tutoring systems to dynamically customize material for individual students. 
These techniques model and reason about the student, the domain and teaching 
strategies, and communicate with the student in real time. Evaluation results 
show increased learning, reduced costs, and improved grades. 

We will demonstrate intelligent and distributed technology that makes educa- 
tion available anytime and anyplace. At the grade school level, a mathematics 
tutor positively influences students’ confidence and image of their mathematics 
ability. Machine learning was used to model student performance and to derive 
a teaching policy to meet a desired educational goal. At the college level, an in- 
quiry tutor moves students towards more active and problem-based learning. 
We will also discuss other tutors that introduce new pedagogy and address in- 
equities in the classroom. 



1 Artificial Intelligence in Education 

The field of Artificial Intelligence in Education (AIED) is relatively new, being less 
than thirty years. Broadly defined, AIED addresses issues of knowledge and learning 
and is not limited solely to production of functional intelligent tutors. Issues and 
questions addressed by this field include: 

What is the nature of knowledge? How is knowledge represented? 

How can an individual student be helped to learn? 

What styles of teaching interactions are effective and when? 

What misconceptions do learners have? 

The field has developed answers to some of these questions, and artificial intelligence 
(AI) techniques have enabled intelligent tutors to adapt both content and navigation 
of material to a student’s learning needs. The goal of AI in Education is not to repro- 
duce existing classroom teaching methods. In fact, in some cases the goal is to re- 
move the traditional education mold altogether. As working and learning become 
increasingly the same activity, the demand for lifelong learning creates a demand for 
education that will exceed the capability of traditional institutions and methods. This 
creates an opportunity for new intermediaries and learning agents that are not part of 
the traditional, formal education system. Such opportunities are likely to be supported 
by computer technology. 
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Ample evidence exists 
that intelligent tutors pro- 
duce a substantial improve- 
ment in learning and pro- 
ductivity in industry and the 
military. Formal evaluations 
show that intelligent tutors 
produce the same improve- 
ments as one-on-one human 
tutoring, which increases 
performance to around the 
98 percentile in a standard 
classroom [1], These tutors effectively reduce by one-third to one-half the time re- 
quired for learning [2], increase effectiveness by 30% as compared to traditional 
instruction [3, 2, 4], and networked versions reduce the need for training support per- 
sonnel by about 70% and operating costs by about 92%. 

The term “intelligent tutor” designates technology-based instruction that contains 
one or more of the following features: generativity, student modeling, expert model- 
ing, mixed initiative, interactive learning, instructional modeling and self-improving. 
The key feature is generativity - the system’s ability to generate customized prob- 
lems, hints or help - as opposed to the presentation of prepared “canned” instruction. 
Generativity relies on models of the subject matter, the student and tutoring, which 
enable the tutor to generate customized instruction as needed by an individual stu- 
dent. Advanced instructional features, such as mixed-initiative (a tutor that both initi- 
ates interactions and responds usefully to student actions) and self-improving (a tutor 
that evaluates and improves its performance as a result of experience), set tutors apart 
from earlier computer-aided instructional systems. No agreement exists on which 
features are absolutely necessary and it is more accurate to think of teaching systems 
as lying along a continuum that runs from simple frame-oriented systems to very 
sophisticated intelligent tutoring. The most sophisticated systems include, to varying 
degrees, the features listed above. 

For example, the Arithmetic 
Tutor described below was gen- 
erative since all math problems, 
hints and help were generated on 
the fly based on student learning 
needs (Figs. 1-3). The tutor mod- 
modeled expert knowledge of 
arithmetic as a topic network, with 
nodes such as “subtract fractions” 
or “multiply whole numbers” 
which were resolved into child 
nodes such as “find least common 
denominator” and “subtract 
Fig. 2. Interface of AnimalWatch for a simple numerators,” Fig. 4. The tutor 
addition of whole numbers problem. 



n til ii 'i it 

There are currently 1 2 natural p 
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Fig. 1. Real World Context: In AnimalWatch, the student 
chooses an endangered species from among the Right 
Whale, Giant Panda and Takhi Wild Horse. 
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modeled student knowledge, recording each sub-task learned or needed based on 
student action and the tutor was self-improving in that it used machine-learning tech- 
niques to predict a student’s ability to correctly solve a problem. 



2 Customizing Help by Gender and Cognitive Development 

The first example tutor, AnimalWatch, used AI techniques to adapt its tutoring of 
basic arithmetic and fractions, Figs. 1-3. It helped students learn fractions and whole 
numbers at a 4 lh -6 th grade level. The tutor used student characteristics including gen- 
der and cognitive development, and an overlay student model which made inferences 
about the student’s knowledge as he/she solved problems. The tutor adjusted its prob- 
lem selection to provide appropriate problems and hints. For example, students un- 
able to handle abstract thinking (according to a Piagetian pre-evaluation) benefited 
from concrete representations and concrete objects to manipulate instead of formal 
approaches, equations, or symbols and textual explanations, Figs. 5-6. Students 
moved through the curriculum only if their performance for each topic was accept- 
able. Thus problems generated by the tutor were an indication of the student’s 
mathematics proficiency and the tutor’s efficiency as described below. 

Results indicated that girls were more sensitive to the amounts of help than to the 
level of abstraction (e.g., the use of concrete objects to manipulate, Fig. 7, vs. equa- 
tions and procedures, Fig. 6) and performed better in problems when the help was 
highly interactive. Boys, affected by the level of abstraction, were more prone to 
ignore help and to improve more when help had low levels of interactivity. 

AnimalWatch tutored arithme- 
tic using word problems about 
endangered species, thus integrat- 
ing mathematics, narrative and 
biology. Math problems were de- 
signed to motivate students to use 
mathematics in the context of 
practical problem solving, em- 
bedded in an engaging narrative, 

Figs. 1 and 2. Students “worked” 
with scientists as they explored 
environmental issues around sav- 
ing endangered animals. Animal- 
Watch maintained a student model 
and made inferences about the 
student’s knowledge as s/he solved problems. It increased the difficulty of the prob- 
lems depending on the student’s progress and provided mathematics instruction for 
each student based on a dynamically updated probabilistic student model. Problems 
were dynamically generated based on inferences about the student’s knowledge, pro- 
gressing from simple one-digit whole-number addition problems to complex prob- 
lems that involve fractions with different denominators. 
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The student's cognitive level was determined via an on-line pretest [5] based on 
Piaget's theory of development and a series of questions on topics such as combina- 
torics, proportions, conservation of volume and other elements that determine student 
ability to reason abstractly. 

Several evaluation 
studies with 10 and 11- 
year-old students total- 
ing 313 children indicate 
that Animal Watch pro- 
vided effective individu- 
alized math instruction 
and had a positive im- 
pact on students' own 
mathematics self concept 
and belief in the value of 
learning mathematics [6, 

7]. When a student en- 
countered a difficult pro- 
blem, AnimalWatch provided hints classified along two dimensions, symbolism and 
interactivity. The first hints provided little information, but if the student kept enter- 
ing wrong answers, AnimalWatch provided hints that ultimately guided the student 
through the whole problem-solving process. 

Expert Model. The expert model was arranged as a topic network where nodes rep- 
resented skills to be taught. Fig. 4. The links between nodes frequently represented a 
prerequisite relationship. For instance, the ability to add is a prerequisite to learning 
how to multiply. Topics were major components of the curriculum, e.g., “add 
fractions” or “divide wholes”, while skills referred to any curriculum elements 
(including topics), e.g., “recognize numeration” or “recognize denominator.” Sub- 
skills were steps within a topic that the student performs in order to accomplish a 
task. For example, the topic “adding fractions” had the subskills of finding a least 
common denominator (LCM), converting the fractions to an equivalent form with a 
new numerator, adding the numerators, simplifying the result, and making the result 
proper. 

Table 1. Three sample add-fraction problems and the subskills required for each. 



Subskill 


Problem 1 

1 1 
— H — 

3 3 


Problem 2 

1 1 

1 

3 4 


Problem 3 

2 5 

1 

3 8 


Find LCM 


No 


Yes 


Yes 


Equivalent fractions 


No 


Yes 


Yes 


Add numerators 


Yes 


Yes 


Yes 


Make proper 


No 


No 


Yes 




Fig. 4. A Portion of the AnimalWatch topic network. 
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Note that a topic such as “add 
wholes” was both a prerequisite 
and a subskill for “add fractions.” 

For a given problem, not all these 
subskills were required. Table 1 
shows the subskills required for 
some sample add-fraction prob- 
lems. 

Fig. 2 is an example of a sim- 
ple problem for addition of whole 
numbers and Fig. 3 an example of 
a topic generated from the “pre- 
fractions” area of the curriculum. 

Generating the topic customized 
to learning needs was one way the 
tutor adapted the curriculum to the 
learning needs of the student. Students moved through the curriculum only if their 
performance for each topic was acceptable. 

Student Model. The student model continually updated an estimate of each student's 
ability and understanding of the mathematics domain and generated problems of 
appropriate difficulty. Students were given hints with little information first and 
richer explanations later if the former were not effective. The student model adjusted 
the difficulty of each problem and constructed each hint dynamically based on pre- 
sumed student learning needs. For example, addition of fraction problems vary 
widely in their degree of difficulty. Table 1. The more subskills required, the harder 
the problem. Similarly, larger numbers also increased the tutor's rating of the prob- 
lem's difficulty: it is harder to find the least common multiple of 13 and 21 than it is 
to find the least common multiple of 3 and 6. 

Subskills referred to steps 
necessary to solve a problem. For 
example ^ + ^ involves fewer 

2 5. 

subskills than -+- which also 

3 8 

requires finding a common mul- 
tiple, making the result proper, 
etc. Problem difficulty was cal- 
culated via a heuristic that took 
into account the subskills used 
and the difficulty in applying 
these subskills. 

AnimalWatch adjusted prob- 
lems based on individual learning 
needs, as suggested in the student 
model and selected a hint tem- 



In zoos, the animats live close 
together If one horse gets sick, the 
others can catch the illness quickly. 
One illness that horses get is called 
strangles It is a bit like mumps in 
people. The horse may not be able 
to eat. Some even starve to death! 
The medicine works well but it is 
expensive 



A bottle costs 1 25 dollars It will 
treat 5 horses. How much does it 
cost to treat 1 horse? 



The result of dividing 1 25 in 5 equal parts 
It 25 HvpUy hm. 



Enter your answer here 

"Ts — . 



Fig. 6. The tutor provided a symbolic hint demonstrat- 
ing the processes involved in long division. 




Are you sure you are trying to 
divide 125 by 5? 

Fig. 5. The tutor provided a textual hint in response 
to the student’s incorrect answer. It later provided a 
symbolic or a manipulative hint. Figs. 6 and 7. 
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plate, perhaps describing proce- 
dural rules, Fig. 6 or requiring 
manipulation of small rods on the 
screen, Fig. 7. The machine 
learner (described in Section 3) 
recorded the effectiveness of 
each hint and the results of using 
specific problems in a database 
used to generate problems and 
hints for subsequent students. 

Figs. 5 and 6 demonstrate hints 
that provide varying amounts of 
information. The hint in Fig. 5, 
bottom left, is brief and text 
based, while the hint in Fig. 6 is 
symbolic and the hint in Fig. 7 is 
interactive, requiring manipulation of rods. If the student continued to make mistakes, 
more and more specific hints were provided until the question was answered cor- 
rectly. 

The student model noted how long the student took to generate a response, both af- 
ter the initial problem presentation and (in the event of an incorrect response), the 
delay in responding after a hint was presented. The student's level of cognitive devel- 
opment [5], according to Piaget's theory of intellectual development [8], correlated 
with the student's math performance and was used to further customize the tutor's 
teaching [9]. 

When the student was presented with an opportunity to provide an answer, the sys- 
tem took a “snap shot” of her current state, consisting of information from four main 
areas: 

Student: The student's level of proficiency and level of cognitive development 

Topic: How hard the current topic is and the type of operand/operators 

Problem: How complex is the current problem 

Context: Describes the student's current efforts at answering this question, and 

hints he has seen. 

Empirical evaluation, Section 4, showed that student reaction to each class of hint 
was dependent on gender and cognitive development. 



In zoos, fhe animals live close 
together. If one horse gets sick, the 
others can catch the illness qwckly 
One illness that horses get is called 
strangles. It is a bit like mumps in 
people. The horse may not be able 
to eat Some even starve to death! 
The medicine works well but it is 
expensive 

A bottle costs 1 25 dollars. It will 
treat 5 horses How much does it 
cost to treat 1 horse? 



fl5 



l»cn grMip hu 2 t*n« «n< • onn 

125 + 5-25 




Fig. 7. Finally the tutor provided a interactive hint, in 
which the student moved five groups of rods, each 
containing 25 units. 



3 Machine Learning Techniques to Improve Problem Choice 

A machine learning component, named ADVISOR, studied the records of previous 
users an$d predicted how much time the current student might require to solve a 
problem by using a “two-phase” learning algorithm, Fig. 8. Machine learning auto- 
matically computed an optimal teaching policy, such as reducing the amount of mis- 
takes made or time spent on each problem. The architecture included two learning 
agents: one responsible for modeling how a student interacted with tutor (the popula- 
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tion student model, 

PSM) and the other res- 
ponsible for constructing 
a teaching policy (the 
pedagogical agent, PA). 

The population student 
model was trained to 
understand student beha- 
vior by observing hun- 
dreds of students using 
the tutor and was capable 
of predicting an indi- 
vidual student’s reaction 
to teaching actions, such 
as presentation of speci- 
fic problem type. 

The PSM took the current state, proposed teaching action and then acted as a simu- 
lation of the future actions of the current student. It predicted the probable response 
of the current student in terms of time taken and correctness of solution. This infor- 
mation updated the state description which, along with another proposed action, was 
fed back into the PSM for the next iteration of the simulation. This process continued 
until the PSM predicted that the student would give a correct response. 

The PSM and PA worked together to enable the tutor to learn a good teaching pol- 
icy directly from observing previous students using the tutor. The architecture was 
evaluated by comparing it to a “classical” tutor that taught via heuristics without the 
aid of machine learning agent, Section 4. The metrics used to assess student perform- 
ance included subjective measures, (e.g., mathematics self-confidence and enjoy- 
ment) as well as objective measures (e.g., mathematics performance). 

The learning component modeled student behavior at a coarse level of granularity. 
Rather than focusing on whether the student knew a particular piece of knowledge, 
the learning agent determined how likely the student was to answer a question cor- 
rectly and how much time was needed to generate this correct response. The learning 
mechanisms predicted whether the student's response would be correct or incorrect 
and how long the student would take to respond. This model was contextualized on 
the current state of the world or the problem being solved. 

State Information Recorded. To make predictions, data about an entire population 
of users was gathered. Logs gathered from prior deployments of AnimalWatch were 
used as training data for the PSM. Over 10,000 data points and 48 predictor variables 
were used. These logs provided the training instances for a supervised learning agent. 
Approximately 120 students used the tutor (a medium sized study) for a brief period 
of time, only 3 hours. The goal of the induction techniques was to construct two pre- 
diction functions: one for the amount of time a student required to respond and the 
second for the probability that the response was correct. Rather than making predic- 
tions about an “average student,” the model made predictions about “an average stu- 



D«t» flOB pioi 
urns of tno» 



T etching gotl 



Ttachng action 



Population Student 
Model 



Pedagogical Agent 



Teaching pohey 



Fig. 8. Overview of the Machine Learning Component. 
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dent with a proficiency of X who has made Y mistakes so far on the problem.” The 
PMS was a student model since it differentially predicted student performance. 

The PSM recorded information from a variety of sources, including information 
about the student, the current problem being worked on, feedback presented to the 
student and context or the student’s current efforts at solving the problem. Informa- 
tion about the student included data about the student's abilities, such as the tutor's 
knowledge about both the student's proficiency in this particular area and the stu- 
dent's overall capabilities. Student data also included proficiency on current topic, 
scores on individual cognitive development test items, and gender. Current problem 
information included were: number of subskills tested, type of problem (operator and 
operand types), and problem difficulty. 

Feedback Information. When AnimalWatch selected a hint, it based its decision on 
a set of hint features that provided information along several hint dimensions. These 
features described hints at a pedagogical level and were generalized among all hints 
in the system, i.e. the PSM learned about the impact of “highly interactive” hints, not 
specifically about “hint #12.” The pedagogical features of each hint included: 

• interactivity 

• procedural information contained in the hint 

• information about the result conveyed by the hint (e.g. “try again” versus telling 
the student to “divide 125 by 5”) 

• information about the context 

• data about the context of the problem representing information about the student's 
current effort while solving the problem. 

There were two regression models. The first had 48 input features and determined 
the probability the student's next response would be correct. The second had the same 
48 features as input, and also used the first model's output (i.e. whether the response 
was correct or incorrect). The second model was responsible for predicting the 
amount of time the student would take until his next response. Note that the model 
did not try to predict the student's longer term performance on the current problem, 
only his/her immediate action. 



4 Evaluations of the Machine Learner 

AnimalWatch was evaluated in classrooms numerous times, with different versions 
of the tutor, deployed in both rural and urban schools, and evaluated with and without 
the machine learning component. Students were randomly assigned to one of two 
conditions: the experimental condition used the ADVISOR and machine learning to 
direct its reasoning; the control condition used the classic AnimalWatch tutor. The 
only difference between the two conditions was that in the experimental group artifi- 
cial intelligence methods were used to make the selection of topics, problems, and 
feedback. The AnimalWatch story-line, word problems, etc. were identical in both 
groups. 
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Table 2. Summary of student performance by topic area. The 
experimental group used ADVISOR, the machine learner tutor; 
the control group used the heuristic tutor. Percentage refers to 
the proportion of problems seen and time refers to the time 
needed to solve a problem. 





Control 


Experimental 


Whole 


Percentage 


73.6% 


60% 


Time 


43.4 sec 


28.1 sec 


Prefraction 


Percentage 


19.3% 


27.3% 


Time 


22.7 sec 


21.7 sec 


Fraction 


Percentage 


7.2% 


12.7% 


Time 


44.5 sec 


38.5 sec 



ADVISOR was given 
the goal of minimizing 
the amount of time stu- 
dents spent per problem. 

Evidence of ADVI- 
SOR’S ability to adapt 
instruction can be seen 
in Table 2. The percent- 
age field refers to the 
proportion of problems 
of a specified type stu- 
dents in each condition 
saw. For example, in the 
control condition 7.2% 

of the problems solved were fraction problems. The time field refers to the mean time 
needed to solve a problem. Students using ADVISOR averaged 27.7 seconds to solve 
a problem, while students using the classic version of AnimalWatch averaged 39.7 
seconds. This difference was significant at PcO.OOl. Just as important, the difference 
was meaningful: reducing average times by 30% is a large reduction. Thus, the agent 
made noticeable progress in its goal of reducing the amount of time students spent per 
problem. 

Equivalent students did per- 
form differently when using 
ADVISOR as compared with 
those using the classic version, 
without the learning component. 

Again, this was evidence that 
the architecture can adapt, not 
that it caused 30% more “learn- 
ing” to occur. 

Students in the experimental 
group solved whole (P<0.001) 
and fraction problems (P<0.02) 
significantly faster than students 
in the control group. Students in 
the experimental group finished 
the whole number and prefract- 
ion topics relatively quickly, so 
worked more on fraction prob- 
lems. 



.6 

3 

3 3D 







Predicted tone 



Fig. 9. PSM's accuracy for predicting response time. 
The PSM's predictions correlated at 0.63 (R 2 =0.40) 
with actual student performance. 



Fig. 9 shows the PSM's accuracy for predicting how long students required to gen- 
erate a response. In this graph, both the predicted and actual response times are meas- 
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ured in milliseconds, and then the log 10 is taken 1 . The PSM's predictions correlated at 
0.63 (R 2 =0.40) with actual performance. 



5 Empirical Evaluation of Animal Watch 

AnimalWatch, with or without the learning component, was effective at teaching 
arithmetic. Additionally, it enabled researchers to evaluate the behavior of students 
with respect to solving math problems based on gender and cognitive development. 
The behavior of girls and boys of same cognitive development was quite opposite in 
terms of their response to hints. In general, the best help types for one gender were 
the worst for the other gender. The tutor pushed students forward, going from simple 
whole number addition problems to others that involved fractions with different de- 
nominators, based in part on the tutor’s ability to correct student mistakes through 
help provided. The tutor recorded the effectiveness of hints and the results of using 
specific problems. The number of mistakes a student made on problems of a similar 
type continued to reduce, showing that they learned the topic. 

Fig. 10 illustrates the number of errors (Y axis) measured against the problems of a 
given type (X axis). The problem types are listed on the right of Fig. 10. Clearly stu- 
dents learned each topic, as errors were reduced from greater than 2 to less than 1 . 
For more difficult topics, e.g. hard division, the learning curve was slower and in- 
volved making more errors. 




— ♦ — Easy addition 
— * — Hard addition 
Easy subtraction 
X Hard subtraction 
X Easy multiplication 
— ♦ — Hard multiplication 
— I — Easy division 
— 6 — Hard division 

H Fraction readiness 1 
Fraction readiness 2 
Fraction readiness 3 
Like fractions 
Unlike fractions 
Average trend 



Fig. 10. Teaching effectiveness of AnimalWatch. Mistake reduction over problems of simi- 
lar type. 



1 Since the granularity of record keeping was at the second level, this explains the horizontal 
level of seconds “bands” at the bottom of the graph. The bottom band is one second, the next 

is two seconds, etc (1 second = 1,000 milliseconds, log 10 1000=3). 




Reasoning about Teaching and Learning 1 1 



Formal help was significantly worse than other kinds of help for low cognitive de- 
velopment boys. Also, formal help was significantly worse for low than for high 
cognitive development boys. Formal help produced significantly worse mistake 
change rates for boys of low cognitive development than for girls of the same cogni- 
tive development. Low cognitive development girls improved significantly less with 
reduced help than girls of high cognitive development with reduced help. Also, girls 
of high cognitive development improved most with reduced help. 

While girls’ math value and self-confidence was affected positively by the exis- 
tence of intense help, boys’ math value was harmed. Some boys may have felt both- 
ered by structured help, especially when we consider they spent 25% less time at each 
hint than girls. Too much help when they didn’t need it slowed them down, probably 
they went through all the help while they could have figured it out by themselves. 

High cognitive development boys behaved opposite to high cognitive development 
girls: girls profited from different amounts of help depending on the topic, while boys 
profited from intense help all over. Gender differences were apparent in the time that 
students spent on hints (independent samples t-test, p<0.05). On average, girls stayed 
25% more time than boys within hints. Overall, girls mastered similar amount of 
topics compared with boys. Boys ignored help more (specially high cognitive devel- 
opment boys) and appeared to be more selective about the help provided to them. 



6 Implementing Difficult Pedagogy: Inquiry Learning 



The previous tutor supported a 
problem to a student and then 
hints as needed. One advantage 
30 students, the tutor was able 
The next tutor we discuss im- 
plements a tutoring strategy 
that is extremely difficult to 
implement in any classroom. 

The inquiry tutor helps 
students ask their own ques- 
tions and refine them so they 
can be answered through 
gathering data in a laboratory 
or library situation. During 
inquiry, students are presented 
with a case, situation and 
goals. They are guided to 
observe and synthesize their 
observations. Inquiry teaching 
is very expensive in terms of 
time and resources. It requires 
that a teacher track, analyze 
and then comment on each 



teaching strategy used in most classrooms; present a 
support him/her to solve the problem by providing 
to Animal Watch is that, unlike a human teacher with 
to individualize problems and hints for each student. 



Statement of Case 




i 



Janet Stone, a 21-year-old woman, cornea to tee you with a 6-month 
history of increaeing nervousness, irritability, palpitations, and 
heat intolerance. She has lost 20 pounds over this period of time, 
despite a good appetite. One year ago, she was jogging 10 miles a 
week. However, she has not done any running in the past 8 months 
because she "doesn't feel up to it” and seems to have some weakness 
m her legs. She reports no previous medical problems. 



^ m 



Fig. 11. Rashi presents a medical case to the student. 
Navigation icons (bottom and right) and are always 
available, providing access to the Inquiry Notebook, 
coach and glossary. Icons on the top light represent data 
collection tools. 
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student’s selection of data and 
creation of hypotheses and 
inferences. Since student 
groups might pursue dissi- 
milar questions and require 
distinct data, supporting in- 
quiry in the traditional class- 
room is very difficult. 

Rashi 2 , the inquiry tutor, 
helps students generate hypo- 
theses and select data. It en- 
courages students to support 
or refute hypotheses with 
sufficient evidence. Coaches 
advise students about illogical 
statements and inconsistent 
reasoning and help them orga- 
nize and qualify their know- 
ledge. The tutor understands (to some extent) the reasoning behind students’ hy- 
potheses. Prototype inquiry tutors exist in human biology, forestry, civil engineering 
and geology [TO, 11], 




Fig. 12. Interview Tool. The student interviews the pa- 
tient though free text, by typing “diet” into the tool. The 
patient answers in audio, video and transcript. 




Fig. 13. Patient Examination Tool enables 
students to measure weight, pulse, blood 
pressure, etc. In this example the student has 
selected the head and is given choices of 
viewing exam results for eyes, ears, neck, etc. 



In the human biology inquiry tutor, a 
patient presents with symptoms. Fig. 
1 1 , including fatigue, weight loss, anxi- 
ety, and sweaty palms. Students try to 
diagnose the cause of these symptoms 
by extracting pertinent information and 
trying to recognize the difference 
between pure observation and infer- 
ence. The patient’s complaints form an 
initial set of data from which the 
student begins the diagnostic process. 

Students brainstorm and list predic- 
tions or subgoals that might resolve 
some aspect of the problem. They type 
in causes for the observed phenomena 
and later predict data that will either 
support or weaken the hypotheses. An 



2 Rashi was a biblical scholar who introduced inquiry methods in the eleventh century. He 
wrote extensive commentaries, produced queries, explanations, interpretations and discus- 
sions of each phrase and verse of the bible. Rashi’ s written commentary on the bible made it 
more comprehensible for everyday scholars. Today, these and other commentaries, assem- 
bled in the Talmud, have been extended to nearly 40 volumes and continue as a source of 
biblical law [12], 
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initial hypothesis might be vague: “She 
sounds like she might have an anxiety 
disorder.” Hypotheses of this sort re- 
quire further refinement. 

Rashi provides several data collec- 
tion tools to enable students to confirm 
or refute their hypothesis and resolve 
open questions. For example, students 
might interview the patient about symp- 
toms, Fig. 12, perform an examination, 

Fig. 13, and request medical history or 
lab tests. Data helps students to elimi- 
nate or support hypothesis and assess 
evidence that bears on their hypothesis 
independent of a teacher’ s input. 

The Inquiry Notebook supports stu- 
dent data collection and helps record 
open questions and hypotheses, Fig. 14. 

Data reveals flaws in hypotheses, stu- 
dents revise hypotheses and change 
their opinions of how strongly data 
supports or refutes hypotheses. Once 
the student is oriented to the goal of the 
case and uses data gathering tools, she 
records meaningful units of data or 
propositions, keeping track of where 
propositions come from (i.e., citing the 
sources) and indicating relationships between propositions by linking them with sup- 
ports/refutes links. Finally, these chains of relationships terminate in hypothesis. 

Students ask the electronic Coach for an assessment of their work and their argu- 
ment supporting or eliminating a hypothesis. The Coach analyzes the student’s In- 
quiry Notebook and history of activities and gives feedback about how best to pro- 
ceed. A Bayesian Belief Network (BBN), the basis of the Coach’s performance, 
comments about the syntactic structure of the student’s argument (Does the student 
understand the difference between data and hypotheses?) and its semantic content 
(Are inferences and conclusions supported by data and medical knowledge?). 

As each student moves through the inquiry cycle, the tutor follows her reasoning 
by matching it with an expert’s assessment of the medical case. As an example, con- 
sider a student who interviewed the patient, recorded important symptoms, “per- 
formed” a medical exam and identified salient symptoms and lab results. The student 
isolated data that supported, refuted, or had no bearing on a given hypothesis. The 
student read medical source documents and studied the patient’s signs and symptoms. 
The tutor tracked student activities and issued prompts through the expert system if 
the student failed to explore some range of information. Rashi responded to the stu- 
dent with carefully crafted questions, never revealing directly the solution to the case. 



Fig. 14. The Inquiry Notebook. Student ob- 
servations in the exam and the interview are 
automatically recorded in the inquiry note- 
book. The student indicated type (observa- 
tions, inferences and hypothesis). 
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Finally, students type in their reports, including all the selected data, inferences 
and hypotheses. These are sent electronically to the human teacher for evaluation. In 
some cases, a sequential review of all observations, hypotheses, data, and explana- 
tions is presented graphically, and can be edited and re-ordered to look for patterns. 
At some point each student makes a final submission which involves designating one 
hypothesis as the “best.” Then they turn in the Inquiry Notebook complete with all the 
competing hypotheses and their arguments for eliminating them. Rashi is being 
evaluated in both small colleges and large universities. 



7 Summary 

Artificial intelligence provides an opportunity to explore teaching methods that might 
be difficult to produce in traditional classrooms or that require excessive teacher time 
and resources. This paper suggested using intelligent tutors to generate individualized 
problems and hints and also to build support for inquiry learning. AI offers a way to 
test teaching methods by bringing a specific strategy (such as problem solving or 
inquiry learning) into a classroom and regulating its components, e.g., problem/hint 
selection, student performance and the interaction of performance with gender or 
cognitive development. The learning strategies are assessed by tracking student be- 
havior as a function of student characteristics, e.g., background and cognitive devel- 
opment. 

In sum, AI technology, along with cognitive science and web-technology, are the 
large poles in the tent, or the key components in a coming revolution in education. 
They help push the frontier of intelligent tutors toward new pedagogy and address 
inequities in the classroom. These systems use a model of the student to customize 
feedback and engage students. They explore the effectiveness of help for students of 
different genders and cognitive developments, and in one case, concluded that girls 
were more sensitive to the amounts of help while boys were affected by the level of 
abstraction. 

AI technology also facilitates development of an inquiry system that tracks, ana- 
lyzes and then comments on a student's selection of data and creation of hypotheses 
and inferences. We expect to identify the strategies students use to generate hypothe- 
ses and explore data. Intelligent tutor technology is almost at the point where we 
could say that the computer system is teaching a student to think! 
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Abstract. ADDS (Approach to Document-based Development of Software) is 
an approach to the development of applications based on a document-oriented 
paradigm. According to this paradigm, applications are described by means of 
documents that are marked up using descriptive domain-specific markup lan- 
guages. Afterwards, applications are produced processing these marked up 
documents. Formulation of domain-specific markup languages in ADDS is a 
dynamic and eminently pragmatic activity since these languages evolve in ac- 
cordance with the authoring needs of the main actors that participate in the de- 
velopment process (i.e. domain experts and developers). OADDS (Operation- 
alization in ADDS) is a processing model that promotes the construction of 
modular language processors and their incremental evolution. Thus, OADDS is 
specifically designed to cope with the evolutionary nature of the domain- 
specific markup languages encouraged by ADDS. ADDS and OADDS have 
successfully been applied to the development of applications in knowledge- 
intensive domains (i.e. transport networks and educational hypermedias). This 
paper also describes the advantages (incremental development and maintenance 
improvement) that this approach supposes for the development of knowledge- 
based systems. 



1 Introduction 

The development of applications in general, and of Knowledge-based Systems 
(KBSs) in particular, can be considered as a linguistic activity arising from the col- 
laboration between the clients that have a problem to be solved, the domain experts 
with the knowledge required to solve that problem, the developers building the appli- 
cation, and the final users. Indeed, looking for ways to facilitate the communication 
between all the actors in this process is essential in order to guarantee a successful 
development. This is particularly true in the development of KBSs, where communi- 
cation between clients, knowledge engineers and developers has always been consid- 
ered to be critical. 

This paper describes our approach for application development, which we called 
document-oriented paradigm, and its specialization in the development of KBSs. 
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According to this paradigm, the building of an application begins by describing the 
application using one or more documents, marking up these documents using a do- 
main-specific markup language, and finally, producing the application using a suit- 
able processor for this language. Thus, the development of a KBS using this paradigm 
implies the provision of a document written in a natural language subset. This docu- 
ment contains the knowledge that is going to be managed by the system. Then, tags 
and attributes are pragmatically added to this document in accordance with a previ- 
ously defined markup language. These tags and attributes make the data and knowl- 
edge structures relevant to the inference engine explicit. Finally, the inference engine 
for the KBS is conceived as a processor driven by the markup. 

The ADDS approach (Approach to Document-based Development of Software) is 
an implementation of the document-oriented paradigm where the document types and 
the languages used to markup them up evolve incrementally according to the needs of 
domain experts and developers. OADDS (Operationalization in ADDS) is a process- 
ing model for ADDS documents that introduces mechanisms used in the production 
of modular processors to adapt to the evolutionary nature of languages in ADDS. 

The rest of this paper is structured as follows. Section 2 describes the development 
of KBSs according to the document-oriented paradigm. Section 3 describes ADDS, 
the approach that implements this paradigm. Section 4 describes OADDS, the opera- 
tionalization model of ADDS. Section 5 describes some related work. Finally, section 
6 presents the conclusions and gives some ideas for future work. 



2 The Development of KBSs 

Using the Document-Oriented Paradigm 

Documents play an important role in human communication. Therefore, the adoption 
of a document-oriented paradigm for the development of applications must be seen as 
a plausible alternative that could alleviate the communication problems arising among 
the different actors engaged in the software development process. 

The development and maintenance of KBSs is particularly sensible to these com- 
munication problems. Indeed, the knowledge acquisition problem is a critical aspect 
of this type of system. The model-based approaches that arose during the nineties (see 
[16] for a survey) conceive the solution to this problem as the explicit formulation of 
a knowledge model capable of identifying and structuring the different types of 
knowledge required to solve a problem, together with the roles played by these types 
of knowledge in the reasoning process. Nevertheless, these approaches usually distin- 
guish between the model and its subsequent implementation. This means that for the 
participants either an initially complete model is provided (and this is not realistic 
even for little toy domains), or they must cope with the maintenance problems derived 
from the translation of model changes into the implementation. This scenario is simi- 
lar to that arising in the domain of educational applications [5]. 

The document-oriented paradigm gives a pragmatic solution to the maintenance 
problem in the construction of model-based KBSs. Indeed, according to this para- 
digm: 
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(a) 



Jam Problem in Incorporation to M30. 

The speed registered by the sensor S 1 is low, 
the speed registered by S2 also is low, 
but the speed registered by S3 is normal. 



(b) 



<pattern> 

<name>Jam Problem in Incorporation to M30.<name> 

<body> 

<andxmeasure>The <type>speed</type> registered by the sensor <sensor>Sl</sensor> is <value>low</valuex/measure>, 
<measure>the <type>speed</type> registered by <sensor>S2</sensor> also is <value>low</valuex/measure>, 
but <measure>the <type>speed</type> registered by <sensor>S3</sensor> is <value>normal</valuex/measure>. </and> 
<body> 

</pattern> 



Fig. 1 . (a) Knowledge about an anomalous situation pattern in a traffic network, (b) markup of 
the knowledge expressed in (a). 

- The model leads to different domain-specific languages for describing the differ- 
ent types of knowledge. Actually, these languages are suitable subsets of the natu- 
ral language similar to those used by the domain experts (see Fig. la). This simi- 
larity facilitates the experts’ elicitation of knowledge as documents in natural 
language. 

- Then, using descriptive domain-specific markup languages the structure of this 
knowledge is made explicit. Thus, documents marked using these languages are 
prepared for their automatic processing (see Fig. lb). This initial markup can be 
performed by the developers. But because of the simplicity and legibility of the 
descriptive markup, domain experts can understand these documents, and they 
can directly modify the knowledge described in them (the contents of such docu- 
ments), and with the help or supervision of the developers, they could even extend 
the language by adding new tags (either directly or using a specific edition tool). 

- The developers, in turn, can include additional operational contents oriented to 
make the final processing of the knowledge possible. Examples of this situation 
are problem - solving methods written in some suitable formal language or 
knowledge transformations given as document transformations. 

- Finally, the implementation of the KBS is obtained building a suitable processor 
for the markup language used in the pragmatic markup of the document. 

With the document-oriented paradigm, the implementation-model duality disap- 
pears, collapsing into documents where the knowledge provided by the experts and 
other additional knowledge mix together. Tags, attributes and structure are added by 
the developers and domain experts to make document processing possible. In addi- 
tion, the use of descriptive markup facilitates the incremental evolution of the lan- 
guages and documents. These domain-specific markup languages are not static, un- 
movable entities, but they can evolve according to changes in the needs of experts 
and/or developers, or when new markup needs are discovered as a consequence of 
model evolution. 

Fig. 2 sketches the structure of a KBS according to the document-oriented para- 
digm. Such a structure is a generalization of the one arising in the arena of electronic 
document processing based on descriptive markup technologies [6], 
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Processing 



Structure 



Knowledge 




Fig. 2. Organization of a KBS according to the document-oriented paradigm. 

The next sections analyze how to adapt the pragmatic nature of the document- 
oriented paradigm, either in the formulation of markup languages, or in the construc- 
tion of the processors for such languages. 




Fig. 3. Participants, activities and products in ADDS. 



3 The ADDS Approach 

ADDS [13 ] is an implementation of the document-oriented paradigm that is mainly 
driven by the authoring needs of the people involved in the process of documenting 
the applications (domain experts and developers). Fig. 3 sketches the different prod- 
ucts, participants and activities involved in ADDS. The next subsections detail each 
of these aspects. 

3.1 Participants 

ADDS distinguishes two types of participants in the application development: 
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- Domain experts. They are responsible for the provision and maintenance of the 
application contents. In the KBS domain, they are the experts that provide the dif- 
ferent types of knowledge that will finally be included in the system. 

- Developers . They are responsible for building the final application. In the KBS 
domain, the range of developers includes, from knowledge engineers that design 
the knowledge models, to programmers developing the inference engine and other 
software needed to produce the executable application. 

According to the document-oriented paradigm, the interaction between these par- 
ticipants is mediated by the final document to be produced ( application document). 
For instance, during the initial stages of the development, developers interact with 
domain experts to decide the type and the form of the contents to be included in the 
application documentation. In addition, developers mark up these contents and assist 
the domain experts during the maintenance of the marked up document. 

3.2 Products 

According to ADDS, application construction involves the following types of prod- 
ucts: 

- The contents integrated in the application document. 

- The application document produced by marking up such contents with tags and 
attributes. 

- The description of the specific markup language used for the markup process of 
such contents. 

- The final application produced by processing the application document. 

3.3 Activities 

ADDS identifies the following activities in the application production: 

- Initial application conception. In this activity, the domain experts, assisted by the 
developers, conceive and produce an initial description of the application to be 
built. In the KBS construction, the developers help the experts to informally de- 
fine the set of documents required to describe all the knowledge needed by the 
system. 

- Markup. In this activity, the developers decide how to mark up the application 
contents documents to obtain the application document. As a result, an explicit 
description of the markup language (given by an schema or DTD) is produced, 
together with the application document marked up with this language. Note that 
because the iterative nature of ADDS, the markup language can evolve to ac- 
commodate newly identified markup needs. 

- Application construction. In this activity, the developers produce the application 
from the application document. During this activity, developers can add new op- 
erational contents to the document and mark up such contents. Finally, they pro- 
duce the application following the OADDS model introduced in the next section. 

- Maintenance. In this activity, the experts, assisted by the developers, perform suit- 
able modifications on the marked contents. These modifications are driven by the 
evaluation of the application produced at previous stages. The domain experts 
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can even master the markup language, thus being able to use this language to add 
new markup. This situation is promoted by the use of descriptive markup, focused 
on content structure, and providing domain significant tag names and attributes. In 
addition, during this activity the need for introducing new contents may arise (in 
the case of a KBS, to introduce new knowledge as a consequence of an evolution 
in the model). Therefore, this will mean an evolution in the markup language 
used. This evolution will be done by developers and approved by domain experts. 



4 The OADDS Operationalization Model 

OADDS is the operationalization model used in ADDS to produce applications from 
marked documents. OADDS is based on the classical techniques of construction of 
language processors based on syntax-directed translation [1], Thus, OADDS con- 
ceives operationalization as the processing of the application document with an ap- 
propriate language processor, that is, a processor specifically built for the markup 
language used to mark up the application document. But, because of the evolutionary 
nature of the markup languages used in ADDS, OADDS establishes mechanisms to 
obtain modular processors from components. These components can be extended and 
combined according to the markup language evolution. Despite being independent 
from specific implementation technologies, OADDS is naturally implemented as an 
object-oriented framework. Finally, the document-oriented paradigm itself can be 
applied in the construction of OADDS processors. So, it is possible to describe proc- 
essors as a collection of marked documents. The contents of these documents will be 
the code associated with the basic semantic actions required during the processing of 
the application documents, while the markup will establish how to combine these 
actions to produce the final processor (in this point of view, OADDS extends and 
combines similar approaches used in languages such as YACC and XSL [19]). Thus, 
the complete documentation of an application could include not only the document 
describing it, but also the documentation of the processor used to process the applica- 
tion document. 

Fig. 4 sketches the different products and activities involved in OADDS. All these 
activities are carried out by the developers (their presence is omitted). The following 
subsections detail each one of the aspects depicted in Fig. 4. 
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Fig. 4. Products and activities in OADDS. 
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4.1 Products 

In addition to the application document and the description of the language used to 
mark up this document, OADDS introduces the processor of this language, together 
with a repository of operational components that facilitates the modular construction 
and the evolution of this processor. 

Application 

Document Document Tree Attributed Tree 



Document Tree 
Construction 

v > 



► : 

Operationalized 
Attributed Tree 

Fig. 5. Outline of the information flow in OADDS processors. 

Fig. 5 sketches the information flow (that can be implemented in a modular way 
using the appropriated operational components) inside the OADDS processors. The 
key object of such processing is the attributed tree. Each node of this tree is associ- 
ated with a set of attributes, each one having a value. The processing starts building 
the tree representing the application document. The construction of this tree can be 
carried out using any of the usual parsing frameworks for structured documents [3]. 
Next, the processing proceeds with the iteration of the attributed tree over a tree op- 
erationalization stage, followed by a tree evaluation stage. During the tree operation- 
alization step, each node in the tree is decorated with (i) a controller, which is a pro- 
cedure determining the evaluation order for the neighbours of the node, and (ii) an 
initializer, an advancer and a finalizer, which are the procedures organizing the local 
processing of this node. During the evaluation stage, the procedures decorating the 
tree are applied in the right order. Basically, this stage consists of a tree traversal 
commanded by the controllers. In this traversal, the processing procedures are applied 
in the right order. The modularity of the model is obtained thanks to the possibility for 
extending these procedures. The extensions will be devoted to propagating new at- 
tribute values in the tree, and to interrupting the evaluation when errors or other ab- 
normal conditions are discovered. These adaptation and extension capabilities are 
essential in order to simplify the development and maintenance of complex systems, 
such as KBSs. 
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4.2 Activities 

OADDS introduces the following two activities into the application development: 

- Provision of the processor. In this activity the processor used to execute the appli- 
cation documented by the application document is provided. Usually such a pro- 
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cessor has been previously constructed for a similar application, so it will be re- 
used on the new application document. In case the new application document uses 
new markup structures, the old processor will be adequately extended by adding 
new operational components for dealing with the new structures and with the ex- 
tensions of existing ones. Only at the initial stages of the development of a new 
type of applications will the implementation of a new processor from scratch be 
mandatory, and, even in this case, the provision of operational components that 
can be reused in the construction of new processors will pay off in the long run 
- Application construction. The application arises as the result of processing the 
application document with its processor. 



5 Related Work 

Descriptive markup languages were introduced as a convenience for the processing of 
electronic documents [6]. HyTime [9], an SGML [6] extension devised to deal with 
the design and construction of hypermedia applications, demonstrated that in some 
domains, these kinds of languages could be used for describing applications in terms 
of documents that, in turn, could be processed for building the final application. 
Moreover, proposals like DSSSL [8] proved that this document-oriented paradigm 
could be used not only for the applications, but also for describing the processors used 
to produce the applications. XML [19] and its related technologies have generalized 
the use of descriptive markup languages as a standard way for information inter- 
change between applications and for many other uses. Indeed, there are several pro- 
posals for applying markup languages to the KBSs domain (see [2]). Note that most 
of these approaches conceive markup languages as static entities. ADDS takes a more 
pragmatic position because markup languages are considered as dynamic objects that 
evolve when the contents or the markup needs of these contents change. OADDS 
gives an operational solution to this dynamic nature of the languages, encouraging the 
construction of modular processors from components that can be extended and 
adapted according to markup language evolution. 

ADDS shares many features with the approach to software development based on 
Domain-Specific Languages (DSLs [17]). The main difference is that, while these 
kinds of languages are, in essence, specific purpose programming languages, ADDS 
follows a document-oriented paradigm, more suitable for content intensive applica- 
tions, such as KBSs, where there is a clear distinction between contents and the lan- 
guages used to structure such contents. 

Modular language processor construction has been popularized by the functional 
programming community, where the main approach is based on monads and monads 
transformers [7], although proposals in the object-oriented paradigm (based on the 
use of mixins [4]), and in the attribute grammar approach to the construction of lan- 
guage processors can also be found [18]. OADDS semantic modularity mechanisms 
are inspired by these proposals, and also resemble the extension mechanisms of meth- 
ods in CLOS [15]. Indeed, the extensions of initializers, advancers and finalizers are 
similar to the definition of before , around and after methods in CLOS. In this sense 
controllers are analogous to primary methods. 

ADDS generalizes the methods for the construction of educational applications for 
foreign language text compression presented in [5]. ADDS also generalizes the ap- 
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proach for the generation of hypermedia prototypes from XML documents describing 
the hypermedia contents and navigation presented in [10]. Work in [ 1 2] [ 1 3] [ 14] 
shows the evolution of ADDS. Initially, in [12] [14] this approach was called DTC 
(structured Documents, document Transformations and software Components). The 
use of this approach for the construction of applications in the transport networks 
domain (more precisely, subway networks) is described in [12]. Work in [11] ex- 
plores its use in the educational hypermedia domain. 



6 Conclusions and Future Work 

This paper outlines the development of KBSs using a document-oriented paradigm. 
According to this paradigm, knowledge is initially described using documents formu- 
lated in the same language used by the domain experts: a subset of the natural lan- 
guage. Afterwards, domain-specific descriptive markup languages are used to make 
the structure of the knowledge described in these documents explicit. This makes its 
automatic processing possible. The ADDS approach, together with the OADDS op- 
erationalization model, provides for an implementation of this paradigm. The prag- 
matic nature of ADDS supposes the evolutionary nature of these markup languages, 
as a response to the dynamic process of determining all the knowledge needed by 
KBSs. The modularity and extensibility of the inference engines promoted by 
OADDS simplifies the maintenance and the updating of the final application. More- 
over, it also simplifies the development of application families, because, once all the 
basic components are made available, it is very simple to produce new related appli- 
cations. 

As future work it seems interesting to perform a more systematic study about the 
markup process applied to knowledge documentation and the cooperation between 
domain experts and developers in this process. Also, a study of the viability of knowl- 
edge acquisition tools based on ADDS / OADDS is needed. These tools will facilitate 
the edition and the markup processes of knowledge documents. 
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Abstract. As a result of the use of OLAP technology in new fields of knowl- 
edge and the merge of data from different sources, it has become necessary for 
models to support this technology. In this paper, we propose a new multidimen- 
sional model that can manage imprecision both in dimensions and facts. Conse- 
quently, the multidimensional structure is able to model data imprecision result- 
ing from the integration of data from different sources or even information from 
experts, which it does by means of fuzzy logic. 



1 Introduction 

Ever since the appearance of the OLAP technology ([5]), there have been various 
proposals to support its special needs, and in particular, two different approaches have 
been documented. The first of these extends the relational model to support the struc- 
tures and operations which are typical of OLAP, and the first proposal of such a type 
can be found in [9], Since then, there have been other proposals (e.g. [10]), and most 
of the present relational systems include extensions to represent datacubes and operate 
on them. The second approach is to develop new models using a multidimensional 
view of the data. Many authors have proposed models in this way ([1, 3, 4, 12]). 

In the early 70s, the need for flexible models and query languages to manage the 
ill-defined nature of information in DSS was identified ([8]). Nowadays, the applica- 
tion of the OLAP technology to other knowledge fields (e.g. medical data) and the use 
of semi-structured sources (e.g. XML) and non- structured sources (e.g. plain text) has 
made these requirements on the models even more important. The systems now need 
to manage imprecision in the data, and more flexible structures are needed to repre- 
sent the analysis domain. New models have appeared to manage incomplete 
datacubes ([7]), imprecision in the facts ([11]), and the definition of facts using differ- 
ent levels in the dimensions ([13]). In addition, these models continue to use rigid 
hierarchies and this makes it extremely difficult for certain domains to be modelled. 
Consequently, this could result in the loss of information when we need to merge data 
from different sources with incompatibilities in their schemata. 
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In this paper, we propose a new multidimensional model which is able to handle 
imprecision in hierarchies and facts by using fuzzy logic. The use of fuzzy hierarchies 
enables the structures of the dimensions to be defined to the final user more intui- 
tively, thereby allowing a more intuitive use of the system. Furthermore, this allows 
information to be merged from different sources with incompatibilities in their struc- 
tures, or even information given by experts to be used in order to improve the multi- 
dimensional schema. In the next section, we shall introduce classical multidimen- 
sional models as an introduction to presenting our approach. Then, in the third section 
we shall include an example of the structure proposed to show how to apply the op- 
erations on the multidimensional structure. The final section presents the main con- 
clusions and future work. 



2 Multidimensional Model 

In this section, we shall present our proposed multidimensional model. Firstly, we 
shall introduce what we have called the classical models (these being the first docu- 
mented models). Secondly, we shall define the multidimensional structure for manag- 
ing imprecision. We shall then include the basic operations on the multidimensional 
models (roll-up, drill-down, dice, slice and pivot), and show how these are applied on 
the fuzzy structure. 



2.1 Classical Multidimensional Models 

In classical multidimensional models, we can distinguish two different types of data: 
on one hand, we have the facts being analysed, and on the other, the dimensions are 
the context for the facts. Hierarchies may be defined in the dimensions. The different 
levels of the dimensions allow us to access the facts at different levels of granularity. 
In order to do so, classical aggregation operators are needed (maximum, minimum, 
average, etc). 

The defined hierarchies use many-to-one relations, so one element in a level can 
only be grouped by a single value of each upper level in the hierarchy. This makes the 
final structure of a datacube rigid and well defined in the sense that given two values 
of the same level in a dimension, the set of facts relating to these values have empty 
intersection. 

The normal operations (roll-up, drill-down, dice, slice and pivot) are defined on 
this structure. 



2.2 Multidimensional Structure 

Definition 1. A dimension is a tuple d=(l,< d ,lj_ J l_^) where / = j l f i=l,...,nj such that 
each / ; - is a set of values and If) 1=0 if ifj, and < d is a partial order relation between 
the elements of /. l± and / are two elements in / such that V/. e l l t < d l . and 
l< d f. 
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All = { All } = l T 



Legal age = {Yes, No} 



Group = {Young, Adult, Old } 



Age = {1,...,100} = lx 



Fig. 1 . Example of an age hierarchy. 



Each element l t is called a level. In order to identify level / of dimension d, we shall 
use d.l. The two special levels l± and / shall be called the base level and top level, 
respectively. The partial order relation in a dimension gives the hierarchical relation 
between the levels. 

In Figure 1, you can see a definition of an age hierarchy. The definition of the di- 
mension as we have presented it would be Age = ({. Age , Group, legal age, All},< Age , 
Age, All), and the relation Age < Age Age, Group < Age Group, Legal age < A Legal age. 
All - Age AU ’ A 8 e -Age Group, Age < Age Legal age, Age < Age All, Group < Age All and 
Legal age < Age All. 

Definition 2. For each dimension d, the domain is dom(d)= [J/_. . 

In the above example, the domain of the dimension Age is dom(Age)=/l,...,100, 
Young, Adult, Old, Yes, No, All}. 

Definition 3. For each l t , the set 



and we call this the set of children of level Z ; . 

Using the same example of the dimension on the ages, the set of children of the 
level All is H AU =f Group, Legal age}. In all the dimensions we define, for the base 
level, this set will be always the empty set, as you can see from the definition. 
Definition 4. For each / ; , the set 



and we call this the set of parents of level l r 

On the hierarchy we have defined, the set of parents of level Age is P Agl ,=/Legal 
age, Group}. In the case of the top level of a dimension, this set will always be the 
empty set. 

Definition 5. For each pair of levels i and I- such that l e H t , we have the relation 
jU.. : / x l . —> [0,l] , and we call this the kinship relation. 

The degree of inclusion of the elements of a level in the elements of their parent 
levels can be defined using this relation. If we only use the values 0 and 1 and one 
element is only included with degree 1 for a single element of its parent levels, this 
relation represents a crisp hierarchy. Following the example, the relation between the 
levels Legal age and Age. is of this type. The parent relation in this situation is 



11 , = {lj Uj * h A h ^ h A l j ^ h ^ U . 



( 1 ) 



p , = {lj lh * ^ A h ^ lj A -31 1 l, h lj) > 



( 2 ) 
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M LegalAge.Age ^ X ) 



fl if xe [18,100] _ Jl if xe [1,17] 

{ 0 in other case ^ L ‘sM g e,A g e( - ) in other case 



(3) 



If we relax these conditions and allow values to be used in the interval [0, 1 ] with- 
out any other limitation, we have a fuzzy hierarchical relation. This allows several 
hierarchical relations to be represented more intuitively. An example can be seen in 
Figure 2 where we present the group of ages according to linguistic labels. Further- 
more, this fuzzy relation allows hierarchies to be defined in which there is impreci- 
sion in the relationship between elements in different levels. In this situation, the 
value in the interval shows the degree of confidence in the relation. 



Young Adult Old 




Fig. 2. Kinship relation between levels Group and Age. 



Definition 6 . For each pair of levels y lj of the dimension d such that 
/ . <j l. a l . A / , the relation rj : / x l . — > [o,l] is defined as 

_{ A,(«>6) if IjeH, 

VijiQf ) - j 0 ©(y/ (a,c) ® rj .(c,b)) in other case ’ 

I l k eH, cel k 1 J 

where © y <E> are a t-conorm and a t-norm, respectively, or operators from the families 
MOM or MAM defined by Yager ([15]), which include the t-conorms and t-norms, 
respectively. This relation is called the extended kinship relation. 

This relation gives us information about the degree of relation between two values 
in different levels in the same dimension. In order to obtain this value, it considers all 
the possible paths between the elements in the hierarchy. Each one is calculated by 
aggregating the kinship relation between elements in two consecutive levels using a t- 
norm. The final value is then the aggregation of the result of each path using a t- 
conorm. By way of example, we will show how to calculate the value of ri All 
Age (All,25). In this situation, we have two different paths. Let us look at each: 

• All - Legal age - Age. In Figure 3. a, you can see the two ways to get to 25 from 
All passing the level legal age. The result of this path is (1 ® 1)® (1 ®0). 

• All - Group - Age. This situation is very similar to the previous one. In Figure 
3.b, you can see the three different paths going through the level Group. The re- 
sult of this path is (1®0.7)®(1®0.3)(D(1®0). 

We must now aggregate these two values using a t-conorm in order to obtain the 
result. If we use the maximum as the t-conorm and the minimum as the t-norm, the 
result is ((W1)6>(W0)) ® ((1® 0.7)@ (1® 0.3)® (1® 0)) =(1 ® 0)® (0.7® 0.3 ® 0) 
-1® 0.7 = 1 , so the value of rj M Age (All,25) is 1, which means that the age 25 is 
grouped by All in the level All with grade 1 . 
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Fig. 3. Example of the calculation of the extended kinship relation, a) path All - Legal age - 
Age b) path All - Group - Age. 



Definition 7. We say that any pair (h,a) is a fact when h is an m-tuple on the attrib- 
utes domain we want to analyze, and ae [0,1]. 

The management of uncertainty in the facts is carried out using a degree of cer- 
tainty with each one. This degree of certainty allows us to use values in analysis that 
might be interesting to the decisor but which imply imprecision. The value a of each 
pair controls the influence of the fact in the analysis. 

Definition 8. An object of type history is the recursive structure 

H = \ Q , , (5) 

where Q is the recursivity clause, F is the fact set, l b is a set of levels (l lh ,...,l nh ), A is 
an application from l b to F, G is an aggregation operator, and H’ is a structure of type 
history. 

The role of this structure will be clear after the operations have been defined in the 
next section. 

Definition 9. A datacube is a tuple C=(D,l b ,F,A,H) such that D=(d I ,...,d n ) is a set of 
dimensions, l b =(l lb> ... ■U is a set of levels such that l jb belongs to d t , F = RU0 where 
R is the set of facts and 0 is a special symbol, H is an object of type history, and A is 
an application defined as A : l lb x...xl nb — > F , giving the relation between the dimen- 
sions and the facts defined. 

If for a = (a a n ) , A (a) = 0, this means that no fact is defined for this combina- 
tion of values. 

Definition 10. We say that a datacube is basic if l b = ({ ) and H= Q. 

Having defined the structure, we shall now show how to translate a multidimen- 
sional schema into our model. An example of a multidimensional model is shown in 
Figure 4. In this schema, we want to analyze the sales in a company. The broken lines 
represent the fuzzy relation between the levels, i.e. the relations take values in the 
entire interval [0,1]. It is possible to see how three dimensions are considered: Time, 
Product and Customer. This schema translated into our model corresponds to 
C sa i e =( l customer, product, time}, {(price, amount)jU0,A,Q). In order to complete 
the definition, we need the dimension structures: Customer = ({Age, Legal Age, 
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Group, All ), <c ustomer , Age, All), Product = ({Product, Category, Provider, Quality, 
All), < p . ct , Product, All), Time = ({Date, Month, Holiday, All},< , Date, All) and 
the application A that gives the relation between the dimensions and the facts: A: Age 
x Product x Date —*■ {(price, amount))U0 . 



2.3 Operations 

Once we have defined the multidimensional structure, we need the basic operations to 
work with it. In this section, we shall define the operations to change the level in the 
hierarchies (roll-up and drill-down) as well as the selection (dice), projection (slice) 
and pivot. First, two preliminary concepts are needed. 

Definition 11. An aggregation operator is a function G(B) where 
B = {(h, a) /(h, a) e F} and the result is a tuple (h ’, a’). 

The parameter of an aggregation operator can be seen as a fuzzy bag ([6]) since it 
concerns a collection of elements (the facts) which can be repeated, with each having 
a value in the [0, 1 ] interval (the a defined in the tuples). 

Definition 12. For each value a in a level /., we have the set 

[ {J F „ /b elj*M iJ (a,b) > 0 if L±l b 

F = < I,eHj . 

[{h/he H aBo, a„A(a n . ,a n ) = /?} if l>=l b 

This set includes all the facts that are in any way related to value a, and this is all 
we need to introduce the operations and to apply them on the fuzzy multidimensional 
structure proposed. 




Fig. 4. Example of multidimensional schema. 

Definition 13. The result of applying roll-up on dimension d ; , level Z r (If l,), using the 
aggregation operator G on a datacube C=(D,l b ,F,A,H) is another datacube 
C’=(D,l’ b ,F’,A \H’) where l’ b =(l l/y ...,l n/ ), A'(a t ,..., a,..., aj = 

G({(b,a ® tj rb (a,c))/(b,a)e F a A(a l ,...,c,..., a n ) = (b,a)}),F’ is the range of^ ’, and 
H’=(A, l b , F, G, H). 
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Definition 14. The result of applying drill-down on a datacube C=(D,l b ,F,A,H) hav- 
ing H=(A’,r b ,F’.H’) is another datacube C’=(D,l’ b ,F’,A’,H’). 

After the definition of the drill-down operation, we can see the role of the structure 
history inside our proposal. This recursive structure enables us to return at any time to 
the previous state before the roll-up was applied. Consequently, loss of information is 
prevented as you progress up the hierarchy. 



Definition 15. The result of applying dice with the condition fl on level l r of dimen- 
sion d j in a datacube C=(D,l b ,F,A,H) is another datacube C’=(D',l’ h ,F',A Q) where 
D’={d 1 ,...,d’ i ,...,dJ where d j ’=(l i ’, < di ,l h ,l T ) having l’={l/l b <Jj} and 

{v/ve L a/?(v)} if /' = /, . 

d'fj = - {v/ve d i .l i a3xg l r j3(x) a^.(x,v) >0} if l'.< d l r , 

{v/ve d : .1. a 3x e l r /3(x ) a 77 ..(v, x ) > 0} if l r < d /' 

A’(a I ,...,a l ,...,a n ) = {h,a® pf)/ a x e d'.l' b a ...a n e d[ .l[ a A{a l ,...,a ii ) = ( h,a ) where 

Up = ® tj rb {c, a/) , andT” is the range of A’. 



Definition 16. The result of applying slice on dimension d : using the aggregation 
operator G in a datacube C=(D,l b ,F,A,H) is another datacube C’=(D’,l b ’,F’,A\ Q) 
where D =(d 1 ,...,d i _ 1 ,d i+1 ,...,d n ), l b = ( T-Uf-di-jb’h+ilf-'Kl )>A a i+l . 

= G({h,a)/3xA(a v . . a t _ v x, a i+l . a n ) = ( h,a )}), and F’ is the range of^ ’. 

Definition 17. The result of applying pivot on dimensions d ; and dj in a datacube 
C=(DJ b ,F,A,H) is another datacube C'=tD\l h ’,F,A Al) where D’=(d 1 ,...,d 1 _ 1 ,dj, d i+1 , 
...,dj-i,di,dj + i,...,d n ), l h =( anc ^ A'(a,,...,a,-i, 

di 5 fl»l , . ■ . ,t/ /—I , d j , 1 1 / +■] , . . . pin { A(a I j . . . pi—] 5 Clj j H ■ , . . . pi /—] 5 Cl i j Cl / H , . . . pin { 

Although we now have the operations to work with the structure proposed, this 
structure can represent objects that are not suitable for the operations defined above. 
We must therefore say when a datacube is valid to work with it. 

Definition 18. A datacube is valid if it is basic or has been obtained by applying a 
finite number of operations on a basic datacube. 



2.4 User View 

We have presented a structure that manages imprecision by means of fuzzy logic. We 
need to use aggregation operators on fuzzy bags in order to apply some of the opera- 
tions presented. Most of the methods previously documented give a fuzzy set as a 
result. As this situation can make the result difficult to understand and use in a deci- 
sion process, we propose a two-layer model: one of the layers is the structure pre- 
sented in the previous section; and the other is defined on this, and its main objective 
is to hide the complexity of the model and provide the user with a more understand- 
able result. In order to do so, we propose the use of a fuzzy summary operator that 
gives a more intuitive result but which keeps as much information as possible. Using 
this type of operator, we shall define the user view. 
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Definition 19. Given a summary operator M, we define the user view of a datacube 
C=(D,l b ,F,A,H) using M as the structure C M =(D,l l} ,F M ,A M ) where A M (a 1 ,...,aJ 
=M(A(a I ,...,a n )) and is the range of A M . 

We can define as many user views of a datacube as the number of summary opera- 
tors used. Therefore, each user can have their own user view with the most intuitive 
view of data according to their preferences by using a datacube. As an example of this 
type of operator, we can use the one proposed in [2], This operator proposes the use 
of the fuzzy number that best fits, in the sense of fuzziness, the fuzzy set or fuzzy bag. 



3 Example 

Once we have defined the fuzzy structure and the operations on it, we shall present an 
example of a simple multidimensional schema in order to show the application of 
operations on it. This example will be modelled using the classical multidimensional 
or crisp model to show the differences between both approaches. We will use the 
schema in Figure 4. 

In the fuzzy case, the dimension Customer is the fuzzy hierarchy on ages which we 
have used previously. The remaining elements in both the fuzzy and the crisp case are 
shown in Figure 5, with the exception of the partial order relations which are clear in 
the schema. Flere we see the first differences between both approaches when we 
model the levels group and holiday. In the crisp case, these concepts are modelled 
using intervals on the ages and dates, respectively. In our approach, we use linguistic 
labels. The facts used in the example and their relation with the values in the dimen- 
sion are shown in Table 1. If the user wants to know “the average amount of sales at 
Christmas for the different age groups and the quality of the provider ”, the sequence 
of operations to apply is: 

Time 

AII={Aii)=l r Moths= pec-02. ..Jan-03) Holiday={Chistrnas) Fechas={01-dec-02 31-Jan-03j=L 

Product 

AII={AII}=l T Category={milk, other} Provider={P1 ,P2.P3} Quality={Good,Medium,Bad} 

Prod uct={mi Ik, breadjuice .cheese . me at}=li 



FllZZy Model l^quality.provider l*provid«r. product 

^Holiday. date 



1-dec 22-dec 6-jan 15-jan 






Good 


Medium 


Bad 




Prov. 


Products 


PI 


1 


0.3 


0 




PI 


Milk, meat 


P2 


0.2 


1 


0.2 


P2 


Juice, cheese 


P3 


o 


0.3 


1 


P3 


Bread 



Quality 


Providers 


Good 


PI 


Medium 


P2 


Bad 


P3 



Crisp Model 



Group 


Ages 


Young 


(0.25] 


Adult 


]25,65[ 


Old 


[65,100] 



Holiday 


Dates 


Christmas 


[22-dec ,6-jan] 



Fig. 5. Dimension structures for the multidimensional schema. 
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1. dice on the dimension time, in the level holiday with the condition (x)=“x is 
Christmas” . 

2. roll-up in the dimension time and level holiday, dimension product and level 
quality and dimension customer and level group, using the aggregation operator 
average on the amount. 

In order to apply the roll-up operation, we need the average aggregation operator. 
Although we can use the classical operator in the crisp case, in the fuzzy model we 
need an operator that works with fuzzy bags. In the example, we have used the opera- 
tors proposed by Rundensteiner ([14]) for a fuzzy relational model. The adaptation of 
these operators to our approach is simple: if R is an aggregation operator defined by 
Rundensteiner, the operator G R for our approach is defined as G R (h)=(R(h),l ). 

We need another operator to show the results in the fuzzy case. We have used the 
linguistic summary ([2]) as the summary operator. The results in both approaches are 
shown in the Tables 2-4. When analyzing the results, we need to bear in mind the 
differences between both approaches. Therefore, when the user gets the result in the 
crisp case, for example for the group young, the results correspond to the query “the 
average amount of sales in the interval [22-dic,6-jan] by the customer with ages in 
the inten’al [0,25] and the quality of the provider”. In the fuzzy case, the user gets a 
result which is closer to his/her concept of Christmas and youth. 

If we want to refine the results in order to obtain “the maximum average amounts 
sold by age groups”, we need to apply slice on the dimensions Products and Time, 
using the maximum aggregation operator. The result is shown in Table 5. 

The results obtained in each case are different. This occurs because the values in- 
volved in each calculation and their importance are different in both approaches. In 
the crisp case, all the values inside the intervals have the same weight in the aggrega- 
tion process. In the fuzzy model, on the other hand, the values at the edges of the 
concepts do not have the same importance as the values in the kernel in the final re- 
sult. We can also see the role of the user view in the fuzzy model. The multidimen- 
sional structure proposed is based on fuzzy logic and the results shown to the user are 
fuzzy sets which are difficult to understand. The user view helps to interpret the re- 
sults, showing the information obtained in a more expressive and understandable way 
to the user (using a fuzzy number and the associated linguistic expression in each 
case). 



Table 1 . Data in the datacube example. 



Fact No. 


product 


Date 


Age 


Price 


Amount 


a 


Fact No. 


product 


Date 


Age 


Price 


Amount 


a 


1 


milk 


23 -dec 


19 


10 


1 


i 


13 


bread 


6-jan 


17 


3 


2 


i 


2 


meat 


7-jan 


40 


18 


3 


i 


14 


meat 


22-dec 


65 


6 


3 


i 


3 


bread 


10-jan 


45 


1 


5 


i 


15 


cheese 


2-jan 


52 


10 


2 


i 


4 


juice 


28-dec 


75 


2 


2 


i 


16 


bread 


27-dec 


66 


5 


2 


i 


5 


cheese 


3-jan 


20 


5 


1 


i 


17 


cheese 


04-jan 


70 


5 


3 


i 


6 


milk 


10-jan 


20 


1 


5 


i 


18 


bread 


24-dec 


60 


3 


6 


i 


7 


bread 


25-dec 


22 


3 


1 


i 


19 


bread 


10-jan 


65 


4 


4 


i 


8 


bread 


l-jan 


55 


5 


2 


i 


20 


milk 


03-jan 


64 


5 


2 


i 


9 


juice 


28-dec 


23 


4 


3 


i 


21 


cheese 


10-jan 


15 


5 


5 


i 


10 


bread 


6-jan 


75 


6 


4 


i 


22 


cheese 


2 8 -dec 


40 


3 


5 


i 


11 


milk 


23 -dec 


78 


3 


3 


i 


23 


bread 


02-jan 


65 


4 


5 


i 


12 


meat 


29-dec 


40 


18 


2 


i 


24 


milk 


26-dec 


23 


5 


5 


i 
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4 Conclusions 



In this paper, we have presented a new multidimensional model. The main contribu- 
tion of this new model is that it is able to operate on data with imprecise facts and 
hierarchies. Classical models impose a rigid structure that makes it difficult for infor- 
mation from different sources to be merged if there are incompatibilities in the sche- 
mata. Our model can handle these problems by means of fuzzy logic which allows our 
proposal to carry out the integration, relaxing the schemata in order to obtain a new 
one that covers the others and attempting to preserve as much information as possible. 
In addition, our model can manage information given by experts which is often im- 
precise. This data can be used to improve the multidimensional schema so that it may 
be used by the final user in the decision process. Another advantage is that it can 
model situations to users more naturally so that they can access the information more 
intuitively. 



Table 2. Result of applying dice on the dimension Time , on the level Holiday with the condi- 
tion P(x)= “x is Christmas" over C. In the fuzzy case, the value shown is the new a of the fact. 
In the crisp case, X means that this fact satisfies the condition. 



Fact 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


Fuzzy 


1 


0.9 


0.6 


1 


1 


0.6 


1 


1 


1 


1 


1 


1 


Crisp 


X 


- 


- 


X 


X 


- 


X 


X 


X 


X 


X 


X 


Fact 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


Fuzzy 


1 


1 


1 


1 


1 


1 


0.6 


1 


0.6 


1 


1 


1 


Crisp 


X 


X 


X 


X 


X 


- 


X 


X 


- 


X 


X 


X 



Table 3. Result of applying roll-up in the dimension Time on the level Holidays, dimension 
Product and level Quality and dimension Customer and level Group in the datacube C’ in the 
fuzzy case. Time dimension is not shown due to the fact that there is only one value. 





Product 


Good 


Medium 


Bad 


Customer 


C” 


C” 


C” 


C” 


C” 


C” 


Young 


{ 1/1 0.6/3, 
0.4/3.67, 
0.2/3.33),l 


(1,1,0, 1.5) 
“greater 
than 1” 


{1/1, 0.6/3, 
0.3/2.881,1 


(1,1,0,1.45) 
“greater than 
1” 


{1/2, 0.6/1. 5, 
0.2/2.41,1 


(2,2,0.5,0.39) 
“around 2” 


Adult 


{1/2, 
0.9/2.5, 
0.6/3. 4, 
0.5/3.33, 
0.2/3.31,1 


(2,2,0,1.19) 
“greater 
than 2” 


{1/3.5, 

0.6/3.33, 

0.3/3.44),l 


(3.5,3.5,0.17,0) 
“a bit less than 
3.5” 


{1/2, 0.8/4, 
0.5/3. 8, 
0.4/3.33, 
0.2/3.3),l 


(2, 2,0, 1.6) 
"grater than 
2” 


Old 


{1/3, 

0.5/2.67, 

0.2/2.61,1 


(3, 3, 0.4,0) 
“a bit less 
than 3” 


{1/2, 
0.8/2. 5, 
0.3/3.22),! 


(2,2,0,1.22) 
“greater than 
2” 


{ 1/4, 0.6/3, 
0.5/3.75, 
0.3/4.2, 
0.2/3.71).! 


(4,4.0.29. 

0.19) 

“around 4” 



Table 4. Result of applying roll-up in the dimension Time on the level Holiday , dimension 
Product and level Quality and dimension Customer and level Group in the datacube C’ in the 
crisp case. 





Product 


Customer (Age group) 


Good 


Medium 


Bad 


Young 


3 


2 


1.5 


Adult 


2 


3.5 


4 


Old 


3 


2.5 


3.7 
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Table 5. Result of applying slice on the dimensions Product and Time in the datacube C”. 





Fuzzy 


Crisp 


Customer 


C’” 




Fact 


Young 


{1/2, 0.6/1. 5, 0.2/2.4, 0.6/3. 0.3/2.88, 
0.4/3.67. 0.2/3.33},l 


(2,2,0.5,13) 
“around 2” 


3 


Adult 


{ 1/3.5. 0.8/4, 0.6/3. 8, 0.6/3.33, 0.3/3.44, 
0.5/3.67),l 


(3.5,3.5,0.17,0.5) 
“around 3.5” 


4 


Old 


1 1/4, 0.6/3, 0.5/3.75, 0.3/4.2, 0.2/3.71, 
0.3/3.22),! 


(4,4,0.99,0.2) 
“around 4” 


3,7 



In order to complete the model, we need to study the properties of the operations 
on the structure. Another line is to develop a graphical means of representing the 
results of the operations so that the information obtained may be read more intui- 
tively. To finish the decision process, we need to study the integration process so as to 
obtain a formal way to merge data from different sources, including experts’ knowl- 
edge. 



References 



1. Agrawal. R. Gupta, A., Sarawagi, S.: Modeling Multidimensional Databases. IBM Re- 
search Report, IBM Almaden Research Center, September 1995 

2. Blanco, I., Sanchez, D., Serrano, J.M., Vila, M.A.: A New Proposal of Aggregation Func- 
tions: the Linguistic Summary. Proceedings of IFSA’2003 Istanbul (Turkey) 2003 

3. Cabibbo, L., Torlone, R.: A Logical Approach to Multidimensional Databases. Advances 
in Databases Technology (EDTB’98) No. 1337 in LNCS pp. 183-197 Springer 1998 

4. Cabibbo, L., Torlone, R..: Querying Multidimensional Databases. Proceedings of the 6 th 
Int. Workshop on databases programming languages (DBPL6) Estes Pork (U.S.A.) 1997 

5. Codd, E.F.: Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT 
Mandate. Technical report, E.F. Codd and Associates, 1993 

6. Delgado, M., Martm-Bautista, M.J., Sanchez, D., Vila, M.A.: On A Characterization of 
Fuzzy Bags. Proceedings of IFSA’2003 Istanbul (Turkey) 2003 

7. Dyreson, C.: Information Retrieval from an Incomplete Data Cube. Proceedings of the 22 n<l 
Int. Conf. on VLDB pp. 532-543. Morgan Kaufman Publishers, 1996 

8. Gorry, G.A., Scott Morton, M.S.: A Framework for Management Information Systems. 
Sloan Management Review 13 (1) (1971) 50-70 

9. Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M.: Data 
Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub- 
Totals. Data Mining and Knowledge Discovery 1 (1997) 29-53 

10. Kimball, R.: The Data Warehouse Toolkit. Wiley, New York, 1996 

11. Laurent, A., Bouchon-Meunier, B., Doucet, A.: Flexible Unary Multidimensional Queries 
and their Combinations. Proceedings of IPMU 2002, Annecy (France) 2002 

12. Li, C., Wang, X.S.: A Data Model for Supporting On-Line Analytical Processing. Proceed- 
ings of the S 01 Int. Conf. on Information and Knowledge Management (CIKM) 1996 

13. Pedersen, T.B., Jensen, C.S., Dyreson, C.E.: A Foundation for Capturing and Querying 
Complex Multidimensional Data. Information Systems 26 (2001) 383-423 

14. Rundensteiner, E.A., Bic, L.: Aggregates in Possibilistic Databases. Proceedings of the 15 th 
Conf. on Very Large Databases (VLDB"98), Amsterdam (Holland), 287-295, 1989 

15. Yager, R.R.: Aggregation Operators and Fuzzy Systems Modelling. Fuzzy Sets and Sys- 
tems 67 (1994) 129-145 



A Framework for Ontology Reuse 
and Persistence Integrating UML and Sesame 



Carlos Pedrinaci, Amaia Bernaras, Tim Smithers, 
Jessica Aguado, and Manuel Cendoya 

San Sebastian Tecnology Park, Paseo Mikeletegi 53, 

20009 San Sebastian, Spain 

{carlos , amaia, t smithers , j essica} emiramon.net 
mcendoya@miramon . es 



Abstract. Nowadays there is a great effort underway to improve the World 
Wide Web. A better content organisation, allowing automatic processing, lead- 
ing to the Semantic Web is one of the main goals. In the light of bringing this 
technology closer to the Software Engineering community we propose an archi- 
tecture allowing an easier development for ontology-based applications. Thus, 
we first present a methodology for ontology creation and automatic code gen- 
eration using the widely adopted CASE UML tools. And based on a study of 
the art of the different RDF storage and querying systems, we couple this meth- 
odology with the Sesame system for providing a framework able to deal with 
large knowledge bases. 



1 Introduction 

The huge amount of information available in the World Wide Web has led researchers 
to work towards improving its organisation, by providing machine-understandable 
data. “The Semantic Web is an extension of the current web in which information is 
given well-defined meaning, better enabling computers and people to work in coop- 
eration. ”[1], It is obvious that the Semantic Web will offer new possibilities for the 
web but as Mark Frauenfelder suggests “There is a big question as to whether people 
will think the benefits are worth the extra effort of adding metadata to their content in 
the first place. One reason the Web became so wildly successful, after all, was its 
sublime ease of creation. ”[2]. 

This paper presents some of the results obtained in the ongoing EU project OBE- 
LIX (IST-2001-33144) during the creation of an ontology-based online events design 
application [3], [4]. We propose a framework for the development of Semantic Web 
applications development so as to bring this technology closer to the Software 
Engineering community. Bearing that purpose in mind, the proposed framework fo- 
cuses on the ease of creation and use. The same way web designers don't have to be 
aware of HTTP protocol's details (and very often even of HTML details), it would be 
interesting to obtain the same level of independency from the implementation details 
surrounding the Semantic Web which are much more complex. Obtaining such facili- 
ties for creating Semantic Web applications is difficult due to its inherent complexity, 
but we should however try to fill the gap between AI community and the Software 
Engineering community, by providing an easy and suitable framework. Moreover, 
software agents will need to interact with other systems, usually based on different 
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ontologies, supported by different architectures and adequately supporting the interac- 
tion with humans. Semantic Web applications are complex systems, thus, maintaining 
and/or improving them is a hard task. Software Engineering has proven that in such 
cases, and in general for every system, a clear, well defined and powerful methodol- 
ogy is a must. Such methodologies facilitate the creation and minimize the problems 
raised when improving and modifying a system. 

In this paper, we first present the use of the Unified Modelling Language (UML) 
[5] for knowledge representation, along with a procedure for generating from a UML 
class diagram a specialised RDF schema [6], [7] and a set of Java classes correspond- 
ing to the classes in the model. Afterwards we compare the different RDF storage and 
querying systems, and justify the selection of Sesame for ensuring persistance for 
large RDF knowledge bases [8]. Next, we present and explain Sesame. In section five, 
we propose an architecture for developing ontology-based applications, using UML 
for knowledge representation and relying on Sesame for data persistance. Finally, we 
conclude and present some directions for future research. 



2 UML for Knowledge Representation and Exchange 

The Unified Modeling Language (UML) is a standard language from the Object Man- 
agement Group (OMG) [9] with an associated graphical notation for object-oriented 
analysis and design. It is widely adopted in industry, and several CASE tools are 
already available to facilitate software engineers' work. The benefits of using UML 
for ontology development have been extensively argued in [10], [11], [12] and [13]. 
Some of these benefits are: (i) UML is a standard language; (ii) UML is a graphical 
notation based on many years of experience in software analysis and design, which is 
currently suported by widely-adopted CASE tools that are more accessible to software 
practitioners than current ontology tools; (iii) agent-based systems will need to inter- 
act with legacy enterprise systems, which often have UML models; (iv) knowledge 
expressed using UML is directly accessible for human comprehension and for ma- 
chine processing; (v) thanks to the modular nature of object-oriented modelling, the 
knowledge in a UML model can be changed without affecting other features. 

In [11] and [13] Stephen Cranfield proposes an implementation for object-oriented 
knowledge representation, using UML for defining ontologies and domain knowledge 
in the Semantic Web. Fig. 1 shows a pictorical description of this proposal. The pro- 
posed methodology is as follows. First, a domain expert designs the ontology graphi- 
cally with one of the available CASE tools supporting UML (e.g. Rational Rose, 
Poseidon, ArgoUML, etc). The ontology is then saved in the standard format XML 
Model Interchange (XMI) [14], Using a pair of XSLT stylesheets the XMI representa- 
tion of the ontology is transformed into a set of Java classes and interfaces corre- 
sponding to the concepts present in the ontology, and into an RDF schema. The java 
classes allow an application to represent knowledge about the domain as in-memory 
data structures. The RDF schema, defines the concepts that an application can refer- 
ence when serializing the knowledge in RDF/XML. For performing the marshalling 
and unmarshalling of objects to and from RDF/XML documents, a marshalling pack- 
age is also provided. This feature is provided via two classes: MarshalHelper and 
UnmarshalHelper. These delegate to the generated Java classes decisions about the 
names and types for each field, and are then called back to perform the un/marshalling 
from/to RDF, using the Stanford RDF API [15]. 
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Marshalling 

package RDF API 




Fig. 1 . Overview of the implementation for object-oriented knowledge representation. (Taken 
from [13]). 



It is important to note that the generated RDF schema does not contain all the in- 
formation from the designed UML model. Its purpose is to define resources corre- 
sponding to all the classes, interfaces, attributes and associations in the ontology in 
order to allow serialisation of in-memory objects in the standard language RDF. Thus, 
for accessing all the ontology information one of the available Java APIs for XMI can 
be used: [16] and [17]. 

The system does also allow modelling incomplete knowledge. Therefore, the gen- 
erated Java classes include extra boolean fields for each attribute that record whether 
the value is known or not. Also, when marshalling incomplete information, a non- 
standard RDF property, notClosedFor, is used and associates a property with a re- 
source, meaning that the information is incomplete. 

Obtaining an instance from the RDF/XML representation involves parsing the 
whole file, which is not a problem for small knowledge bases. However, when deal- 
ing with large knowledge bases, there are more efficient approaches: RDF storage and 
querying systems. 

3 Comparison of the RDF Storage and Querying Systems 

To adapt Cranefield's approach to large knowledge bases, we have studied the differ- 
ent RDF storage and querying facilities available. The state of the art of the different 
systems is based on [18], with updated information. 
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Table 1 presents an analysis of the different storage systems currently available. 
The main criteria that were kept in mind for determining the RDF storage and query- 
ing system that suits better to our needs are: 

• Storage: The method/architecture used for ensuring the data persistance. 

• Platform: List of all the different platforms supported. It includes the Operating 
Systems but also the need for any other components like a Perl interpreter or a 
Java Virtual Machine. 

• API: The possible ways for interacting with the system. It includes protocols 
and APIs provided for different programming languages. 

• Querying: The languages the system allows to be used for querying a data re- 
pository. 

• Inferencing: The capability of the system to infer new knowledge, that is to 
generate new statements based on the existing knowledge. For the majority of 
the systems only class subsumption is provided. However, some systems allow 
more powerful inferencing by providing mecanisms for defining user rules. 

• Extras: Whether the system has other functional elements associated or prepared 
for interacting with it. 

From the analysis and comparison performed, and shown in , Sesame was chosen 
for the following reasons: 

Sesame allows inferencing over RDF(s) thanks to its query language RQL [19], 
[20], Moreover, the system can be deployed in any platform with a Java Virtual Ma- 
chine. It provides several ways for interacting with it such as RMI, SOAP or HTTP. It 
has been installed on top of many DBMS like Oracle, MySQL or PostgreSQL and has 
a generic implementation for SQL92 compliant DBMS. In addition to all these char- 
acteristics, support for DAML+OIL [21] has been added, improving its capabilities 
but also showing Sesame's modularity and the possibility to adapt the system to new 
languages. Finally, the new versionning and access control features implemented, turn 
Sesame into a suitable system for developing and maintaining knowledge bases pro- 
viding the same control level as CVS does for programmers. 

It is worth noting that, although KAON [22] and Cerebra [23] are good candidates 
for their interesting features, Sesame is superior to KAON for its support for 
DAML+OIL. Concerning Cerebra the fact it is not Open Source was determinant. 



4 Sesame 

Sesame is a system for efficient storage and expressive querying of large quantities of 
metadata in RDF and RDF Schema. It was initially developed by Aidministrator Ned- 
erland b.v. as part of the European 1ST project On-To-Knowledge [24] and is cur- 
rently been extended and improved by Aidministrator Nederland b.v., the “Sesame 
community” and NLNet [25]. 

“Sesame's design and implementation are independent from any specific storage 
device. Thus, Sesame can be deployed on top of a variety of storage devices, such as 
relational databases, triple stores, or object-oriented databases, without having to 
change the query engine or other functional modules” [8]. This independence is 
granted by the Storage And Inference Layer (SAIL) (see Fig. 2). SAIL is an Applica- 



Table 1 . RDF storage and querying systems comparison. 
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tion Programming Interface (API) that offers specific methods for accessing RDF 
information. It defines a basic interface for storing, retrieving and deleting RDF and 
RDFS from repositories while it abstracts from the particular storage mechanism. It 
was designed to support low end hardware like PDAs and to be extendable to other 
RDF-based languages. Several implementations of SAIL are distributed with Sesame 
like SQL92SAIL, which is a generic implementation for SQL92 compliant DBMS, 
SyncSAIL for supporting concurrent reads as well as implementations for specific 
DBMS like MySQL, OracleDB and PostgreSQL. 



Clientl Client2 Client3 




Fig. 2. Sesame's architecture. Taken from [8]. 



Sesame implements the Resource Query Language (RQL) a declarative language 
for querying both RDF descriptions and RDF schemas, as well as RDQL [26] which 
is derived from SquishQL [27]. These functions are provided by the Query Module 
which performs the queries on a repository. Any query is first parsed to build a tree 
model representation, which is afterwards optimised. The majority of the query is 
evaluated in this module, the access to the repository is handled by SAIL. It is impor- 
tant to note that Sesame implements a slightly modified version of the RQL language 
proposed in [20]. Sesame's version of RQL includes support for domain and range 
restrictions as well as multiple domain and range restrictions, but it does not feature 
support for datatyping. 

For the metadata administration, another module is provided, the Admin Module. 
Its purpose is to manage the insertion and deletion of RDF and RDF Schema informa- 
tion into/from a repository. 
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The extraction of any information from a Sesame repository is handled by the Ex- 
port Module. This module allows to selectively export the schema, the data or both 
from a repository, facilitating the integration and interaction with other RDF tools. 

Concerning the interaction with external applications Sesame currently offers three 
methods: HTTP, SOAP and RMI. Each protocol has its associated handler, which 
translates and redirects any query received into an intermediate module: the Request 
Router. This intermediate module abstracts Sesame's core from any protocol specific- 
ity leaving the possibility to add a new handler without having to modify the rest of 
the system. 

For making the results of the On-To-Knowledge project easier for integration in 
real-world applications an “administrative” software infraestructure was created: The 
Ontology Middleware (OMM). “The central issue is to make the methodology and 
modules available to the society in a shape that allows easier development, manage- 
ment, maintenance, and use of middle-size and big knowledge bases”[ 28]. In particu- 
lar the OMM supports versionning, access control and meta-information for knowl- 
edge bases forming the Knowledge Control System (KCS). In addition to the 
administrative modules, BOR extends the reasoning capabilities of Sesame by provid- 
ing support for DAML+OIL. This new reasoning module implements the SAIL API, 
thus it can perfectly interact with the rest of the modules of Sesame. 



5 Architecture Proposal 

We have seen previously that in Stephen Cranefield's approach a marshalling package 
is used for mashalling and unmarshalling object-oriented information between in- 
memory data structures and RDF serialisations of that information. This solution is 
not efficient enough for managing large RDF files. Thus, the available RDF storage 
and querying tools have been studied, and Sesame was choosen based on its charac- 
teristics. 

In order to support large knowledge bases (more than five thousand triples), Fig. 3 
shows an adaptation of Stephen Cranefield's approach by replacing the marshalling 
elements by calls to the Sesame API. Any serialisation or deserialisation of knowl- 
edge is performed over an RDF repository in Sesame. The generation of ontology- 
based applications remains, from the developer point of view, unchanged and trans- 
parent. The process still involves editing the ontology in a CASE environment sup- 
porting UML and XMI. Afterwards Java classes and the RDF Schema file are gener- 
ated and their usage, during the creation of an ontology-based application, remains 
unmodified. However, the architecture gains greatly in versatility and power due to 
the new mechanisms that grant the persistence and access of the knowledge base 
provided by Sesame. The RDF/RDF Schema is stored in a Sesame repository. Thus, 
applications interact with Sesame for retrieving and/or storing knowledge and at the 
same time they have all the Sesame's features available like, for example, the query- 
ing language RQL. 

There is however an important difference concerning the generation of the Java 
classes. The proposed architecture maintains the XSLT for generating the RDF 
schema file, whereas the generation of the Java classes is not performed using XSLT. 
We are developing a Java Code Generator that benefits from Sesame's features by 
accessing the ontology stored in a Sesame repository where the associated RDF 
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schema has been stored. Thanks to the SAIL API Sesame offers, our program can 
browse the whole ontology in a more confortable way. Thus, the difficulties associ- 
ated to the use of a stylesheets processor are avoided. Moreover, the code generation 
gains in modularity, and ease of maintenance, so that future improvements can be 
easily added. 

We are also investigating another important aspect, which is the possibility of 
adapting the whole system to a more powerful language like DAML+OIL. Several 
projects are already using UML and DAML+OIL together. The UML Based Ontol- 
ogy Tool-set (UBOT) project [29] is working on an UML to DAML mapping [30]. In 
this project UML is also used as a front-end for visualizing and editing DAML on- 
tologies. Also, the Components for Ontology Driven Information Push (CODIP) pro- 
ject [31] is using UML to build and map DAML ontologies. This project is creating 
the DAML-UML Enhanced Tool (DUET) which provides a UML visualization and 
authoring environment for DAML. Core DAML concepts are being mapped into 
UML through a UML profile for DAML. DUET is currently available as a plug-in for 
Rational Rose [32] and ArgoUML [33]. The results of both projects could be applied 
to the proposed architecture for obtaining a “DAML+OIL version”. 



UML 

▼ 

XMI 



Generates 

Java Code Generator ► Java source liles 



Reads 



Uses 

^ Loads 

Sesame API 






XSLT 



Stored 






ReadAVrile 



RDF Schema 



Sesame 

Repository 



> Uses 



Applications 



Fig. 3. Architecture proposal. 



Finally, in addition of the persistance related benefits, Sesame comes with a Web 
interface, that can be installed on a web server like Tomcat. This is a step forward for 
publishing the ontologies along with the instances in the World Wide Web, so that 
external applications like agents, can also retrieve the information and process it. 



6 Conclusions and Future Research 

In the previous sections we have described an architecture for creating ontology-based 
applications in a more suitable way for Software Engineers than the currently avail- 
able tools like OilED, Ontoedit or Protege. This architecture integrates the UML to 
RDF mapping based on the approach presented in [11] and [13], with Sesame as the 
RDF storage and querying system. This integration is also improved by the addition 
of our Java Code Generator which makes use of the best of the two integrated ap- 
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proaches. The result is a framework for developing ontology-based applications in an 
easy and scalable way, with an automatic code generation to facilitate the use of ob- 
ject diagrams as internal knowledge representation structures. However, the majority 
of the ontology-based applications that have been developed, have shown that an 
ontology expressed in RDF or DAML+OIL is not enough for obtaining all the needed 
functionality. They still need the capability to define rules and constraints, so as to 
provide more powerful inferencing over the knowledge base. Unfortunately there is 
no standard language for defining rules. This has been solved by different developers 
with ad hoc methods: choosing the most appropriate and convenient inferencing en- 
gine or directly with hard-wired code. In our case, there is no mechanism provided for 
defining inferencing rules, thus it would be desirable to cover also that aspect. UML's 
definition includes the Object Constraint Language (OCL), however it lacks a formal 
definition. Currently the precise UML group [34] is addressing this issue. 

With a formal specification, the code generation could also integrate automatic 
rules generation based on the OCL rules definition. This kind of code generation has 
already been undertaken by Frank Finger in [35]. Further research is needed in that 
respect. 

Finally, we are also investigating dynamic code generation over evolving ontolo- 
gies so as to provide a better adaptability to the dynamism of the Web. 
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Abstract. In recent years, particle filters have emerged as a useful tool that en- 
ables the application of Bayesian reasoning to problems requiring dynamic state 
estimation. The efficiency and accuracy of these type of filters are highly de- 
pendent on an appropriate propagation of the particles in time. In this paper we 
present a new method to improve the propagation step of the regular particle fil- 
ter. Using results from the theory of importance sampling, our method adaptively 
propagates the set of samples without adding a significant computational load to 
the normal operation of the filter. Compared to existing techniques, our approach 
introduces two important enhancements: 1) An adaptive method to improve the 
propagation function, 2) A mechanism to identify when the use of adaptation is 
beneficial. We show the advantages of our method by applying the resulting filter 
to the visual tracking of targets in a real video sequence. 



1 Introduction 

The particle filter is a highly general tool used to perform dynamic state estimation 
via Bayesian inference. Over the last years, this tool has been successfully applied in 
diverse engineering fields to solve a variety of problems. The key idea is to represent a 
posterior distribution by samples (particles) that are constantly re-allocated after each 
new estimation of the state. 

The re-allocation or propagation of the samples in time plays a key role in the effi- 
ciency and accuracy of the particle filter. As a Monte Carlo (MC) technique the effec- 
tiveness of the filter depends on allocating the samples in key areas of the hypotheses 
space. The traditional implementation of the particle filter uses a combination of the 
current estimation of the posterior and the dynamics of the process to allocate the sam- 
ples to the next iteration. Therefore the efficiency of the filter is highly dependent on 
how this combination or dynamic prior can resemble the new posterior distribution. 

The main limitation of using the dynamic prior as the importance function is that it 
does not consider the most recent observations. This can be highly inefficient in cases 
where the current observations do not support relevant areas under the prior, or in 
Bayesian terms, when there is a disagreement between the prior distribution and the 
likelihood function. 

Previous works [4] [7] presented methods to improve the propagation step of the 
particle filter by incorporating in the predictions the most recent evidence available. In 
this paper we exploit the same idea, but our method provides two important and com- 
plementary advantages: 1) There is an generic and mathematically founded mechanism 
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to improve the propagation function, 2) There is a mechanism to decide when it is worth 
to improve the propagation function. To our current knowledge, this last point has not 
been addressed before. 



2 Background 

2.1 Particle Filter 

The main goal of a particle filter is to keep track of a posterior distribution. In the 
dynamic case, the posterior distribution can be expressed through Bayes’ rule by: 

P{xt/yt) = P P(yt/x t ) P(x t /y t -i) (l) 

where /5 is a normalization factor; x t represents the state of the system at time t; and y t 
represents all the information collected until time t. Equation (1) assumes that xt totally 
explains the current observation y t . 

The particle filter estimates the posterior in Equation ( 1 ) by a discrete distribution 
given by a set of weighted samples. The estimation is achieved in three main steps: 
sampling, weighting, and re-sampling. The sampling step assumes that the dynamics of 
the system follows a first order Markov process. Then the dynamic prior in Equation 
(1) can be expressed by: 



N 

P(x t /y t - 1 ) = Y p ( x t/ x t-i) P( x t-i/yt-i) (2) 

1=1 

Equation (2) provides a recursive implementation of the filter, which is one of the key 
points that explains its efficiency. Equation (2) allows the filter to use the last estimation 
P(xt-ifyt-i) to select the particles for the next iteration. These particles are then 
propagated by the dynamics of the process P{x t /x\_ 1 ) to complete the sampling step. 
Next, in the weighting step, the resulting particles are weighted by a likelihood term. 
Finally, a re-sampling step is usually applied to avoid the degeneracy of the particle 
set [2]. 

Recently, independent works by Doucet [1] and Liu et al. [3] present an interesting 
alternative view of the filter in terms of the statistical principle of importance sampling 
[6]. Importance sampling provides an efficient way to obtain samples from a density 
p(x), that we call the true distribution, in cases where the function can be evaluated, 
but it is not convenient or possible to sample directly from it. The basic idea is to use a 
proposal distribution q(x) (also called importance function) to obtain the samples, and 
then weigh each sample Xi by a compensatory term given by p(xi)/q(xi). It is possible 
to show [6] that under mild assumptions the set of weighted-samples can be used to 
represent p(x). 

In terms of importance sampling, it is possible to view the sampling and weighting 
steps of the particle filter as the basic steps of an importance sampling process. In this 
case, given that the true posterior p(xt/yt ) is not known, the samples are drawn from 
an importance function that corresponds to the dynamic prior P(x t /yt-i)- Using this 
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importance function, the compensatory terms are exactly the un-normalized weights 
used in the weighting step of the particle filter. 

The interpretation of the particle filter in terms of importance sampling provides 
a more general setting. In particular the theory of importance sampling suggests that 
one can use alternative proposal distributions that can achieve a better allocation of the 
samples. Unfortunately, the use of an arbitrary importance function can significantly in- 
crease the computational load in the calculation of the weights. To see this clearly, con- 
sider the use of an arbitrary importance function g(xt/x^,y^). Using an MC approx- 
imation of the dynamic prior p(xt/yt-i), the un-normalized weight wj corresponding 
to the sample x 3 t is given by: 

j = p(yt/x 3 t)Y,lLiP(x J t/xt-i) 

4 g(xi/x(.),y(.)) 

where each x^_ 1 is a fair sample from p(xt-i/yt-i)- In this case, as opposed to the 
standard particle filter, the estimation of each weight requires the evaluation of the 
dynamic prior. This increases the computational complexity of the resulting filter to 
0(M ■ N), where M is the number of samples used for the MC approximation of the 
dynamic prior and N is the number of particles. Giving that M and N are generally 
very large, the use of an arbitrary importance function takes away the computational 
efficiency of the particle filter, which is one of its main strengths. In this paper we show 
a new method to build a suitable importance function that takes into account old esti- 
mates of the state and current observations. One of the advantages of this new approach 
is that the complexity of the resulting filter is still O(N). 

2.2 Previous Work 

In the literature about particle filters and importance sampling, it is possible to find 
several techniques that help to allocate the samples in areas of high likelihood under 
the target distribution. The most basic technique is rejection sampling [6]. The idea of 
rejection sampling is to accept only the samples with an importance weight above a 
suitable value. The drawback is efficiency: there is a high rejection rate in cases where 
the proposal density does not match closely the target distribution. 

In [8], West presents a method to adaptively build a suitable importance function 
using a kernel-based approximation (mixture approximation). The basic idea is to apply 
consecutive refinements to the mixture representation until it resembles the posterior 
with a desired accuracy. This approach is simple and general, but the computational 
complexity is 0(R ■ M ■ N)\ where R is the number of refinements, M the number of 
components in the mixture, and N the number of particles. 

Pitt and Shephard propose the auxiliary particle filter [4], They argue that the com- 
putational complexity of the particle filter can be reduced by performing the sampling 
step in a higher dimension. To achieve this, they augment the state representation with 
an auxiliary variable k that corresponds to the index in the sum to calculate the dynamic 
prior in Equation (3). To sample from the resulting joint density, Pitt and Shephard use 
a generic importance function that produces a sampling scheme that is O(N). The gain 
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in efficiency comes from using an importance function that is proportional to the prob- 
ability that a particle xf‘_ A evolves to a particle xj with a high probability under the 
likelihood function. The disadvantage of the method is the additional complexity of 
finding such a convenient importance function. Pitt and Sheppard give just some gen- 
eral intuitions about the form of a possible function, and in this paper we improve on 
this point by presenting a new method to find a suitable importance function. 

In the context of mobile robot localization, Thrun et al. [7] notice that in cases 
where the likelihood function consists of a high peak, meaning a low level of noise in 
the observations, the particle filter suffers a degradation of performance. This suggests 
that the particle filter performs worse with accurate sensors. The explanation for this 
counter-intuitive observation comes from the fact that for a peaked likelihood a slightly 
inaccurate prior can produce a significative mismatch between these distributions. To 
solve this problem they propose using a mixture proposal distribution consisting of 
the prior and the likelihood function as an importance function. The problem of this 
approach is the need to sample directly from the likelihood function, which in many 
applications is not feasible or prohibitive. 



3 Our Approach: Adaptive Propagation of the Samples 

The regular implementation of the particle filter uses the dynamic prior as the impor- 
tance function. Although this simplifies the calculation of the importance weights, al- 
lowing a computational complexity of O(N), it has the limitation of allocating the 
samples without considering the most recent observation y t . This section shows a new 
algorithm that improves this situation by incorporating the current observation in the 
generation of the samples, and also keeping the computational complexity of O(N). 
Consider the following expression for the dynamic prior: 

p(x t /y t - 1 ) = J p{xt/xt-i)p{xt-i/yt-\)dxt-\. (4) 

Using the particle filter and MC integration, this integral can be approximated by: 

n 

p{x t /y t -i) ~ E (3 k p(x t /x\ Li) (5) 

fc= i 

where the set of weighted samples {x^_ 1 , /3fe}^ =1 corresponds to the approximation 
of the posterior given by the particle filter at time t — 1. Using this approximation, it is 
possible to generate samples from the dynamic prior by sampling from a set of densities 
p(xt/x^_ 1 ) with mixture coefficients (3 k - 

The previous sampling scheme is analogous to the re-sampling and sampling steps 
of the regular particle filter. Under this scheme the selection of each propagation den- 
sity depends on the mixture coefficients (3k s, which do not incorporate the most recent 
observation y t . From an MC perspective, it is possible to achieve a more efficient allo- 
cation of the samples by including y t in the generation of the coefficients. The intuition 
is that the incorporation of y t increases the number of samples drawn from mixture 
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components p(xt/xf_ 1 ) associated with areas of high probability under the likelihood 
function. 

Under the importance sampling approach, it is possible to generate a new set of 
coefficients /?£ by sampling from the desired importance function p(.xy_i /y f ) and then 
adding appropriate weighting factors. In this way, the set of samples x\ from the dy- 
namic prior p(x t / yt-i ) is generated by sampling from the mixture, 

n 

^PkPixt/xt-i) (6) 

k= 1 



and then adding to each particle x\ a correcting weight given by, 



p{ x t-i/y t ) 



with x\ 



p{x t /x jLi) 



( 7 ) 



The resulting set of weighted samples {x l t , u>J}" =1 still comes from the dynamic prior, 
so the computational complexity of the resulting filter is still O(N). The extra com- 
plexity of this operation comes from the need to evaluate and to draw samples from 
the importance function p{x\_ x lyt). Fortunately, the calculation of this function can 
be obtained directly from the operation of the regular particle filter. To see this clearly, 
consider the following: 



p(x t , xt-i/yt ) oc p{y t /x t , x t -i, y t -i) p(x t , x t -i/y t -i ) 

°c p(y t /x t ) p(x t /x t -i,yt-i)p(xt-i/yt-i) 
oc p(yt/xt)p(xt/xt- 1 }p(x t -i/yt-i) (8) 



Equation (8) shows that, indeed, the regular steps of the particle filter generate an ap- 
proximation of the joint density p(xt, Xt-i/yt)- After re- sampling fmmp(xt-i/yt-i), 
propagating these samples with p(xt/xt- 1 ), and calculating the weights p(y t /xt), the 
set of resulting sample pairs (x\, xl_ x ) with correcting weights p(y t /x\) forms a valid 
set of samples from the joint density p(x t , Xt-i/yt)- Considering that p(x t ~i/yt ) is 
just a marginal of this joint distribution, the set of weighted-samples x l t _ 1 are valid 
samples from it. 

The previous description suggests that an adaptive version of the particle filter that 
uses yt in the allocation of the samples can be constructed with a ()(2N) algorithm. 
First, N particles are used to generate the importance function p(xt-i/yt)- Then, start- 
ing from this importance function, another N particles are used to generate the desired 
posterior p(xt/yt)- Figure 1 uses a 2-D tracking example to illustrate the main steps 
involved in this adaptive version of the particle filter. In this example, each hypothesis 
about the position of a target is given by a bounding box defined by height, width, and 
the coordinates of its center. 

Figure la) shows the initial set of weighted hypotheses used to estimate a hypo- 
thetical posterior distribution at time t — 1. This hypothetical posterior consists of three 
main clusters of bounding boxes, which are labeled with identification numbers to fa- 
cilitate their reference within the text. For each of the rectangular hypotheses, its gray 
level intensity is proportional to its probability, and its thickness is proportional to the 
number of times that the hypothesis is repeated in the sample set. 
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Fig. 1 . Schematic view of the different steps involved in the modified version of the particle filter 
that includes an updated version of the dynamic prior using the current observation yt. For each 
rectangular hypothesis, its gray level intensity is proportional to its probability, and its thickness 
is proportional to the number of times that the hypothesis is repeated in the sample set. The 
algorithm is equivalent to the application of two iterations of the particle filter. The first iteration 
is shown in Figures a)-d). It provides an estimate of p(xt-i/yt), which corresponds to an updated 
version of the prior p(xt-i/yt.-i) including the last observation y t . Next, Figures e)-h) show the 
second iteration of the filter. This corresponds to a modified version of the regular particle filter 
using the updated version of the prior to run the re-sampling step that determines the allocation 
of the samples to estimate p{xt/yt). 



In the upper part, Figures la)-d) sketch the first three steps of the algorithm. These 
correspond to the regular steps of the particle filter. First, the transition between Figures 
la) and b) shows the re-sampling step. Because the particles in cluster 1 have higher 
probability, they are re-sampled many times. In contrast, only a few of the particles in 
clusters 2 and 3 survive the re-sampling. 

The transition between Figures lb) and c) shows the sampling step. The example 
assumes a stationary and isotropic motion model, such as a Gaussian model of zero 
mean and low variance. Using this type of motion model, the resulting predictive func- 
tion or dynamic prior p(xt/yt-i) is characterized by a massive exploration of the state 
space around cluster 1 . 

The transition between Figures lc) and d) shows the weighting step. To illustrate the 
relevance of the algorithm proposed here, the example assumes a mismatch between the 
dynamic prior and the likelihood function. In this way, while the prior mainly supports 



A Method to Adaptively Propagate the Set of Samples Used by Particle Filters 



53 



the exploration of the area around cluster 1, the likelihood function gives a higher sup- 
port to the particles in cluster 2. Figure Id) shows the resulting representation of the 
posterior at time t. The representation is highly inefficient: while many unlikely parti- 
cles are allocated around cluster 1, just a few highly likely particles are allocated in the 
critical area around cluster 2. 

Figures le)-h) sketch the novel steps of the algorithm. These are similar to the regu- 
lar steps of a particle filter with the important modification that the original prior at time 
i — 1 is enhanced including information about the most recent observation y t . Using the 
estimate of the joint conditional density p(xt, Xt-i/yt) built by the regular steps of the 
particle filter, the algorithm discards the samples x\, leaving an estimate of p(xt-i/yt)- 
This density is the starting point to the next step of the algorithm denoted as importance 
re- sampling. 

The transition between Figures le) and f) shows the importance re-sampling step. 
This step provides the re-allocation of the particles towards areas associated to high 
probability under the likelihood function. Using importance sampling, the new sam- 
ples from p(xt~i / yt-i) are drawn from the estimate of p(xt-i/yt)- Each new sample 
x J t _ 1 is weighted by the correction term p(x l t ’^_ 1 /yt-i)/p(yt/x l t ). The notation x z t ’^_ 1 
denotes that the new particle x J t _ 1 is a re-sampled version of a particle x\_ Y from the 
set {x\_ x , used to estimate p(x t -i/yt)- 

In contrast to the representation of the prior shown in Figure lb), the new represen- 
tation shown in Figure If) has shifted the allocation of the samples toward cluster 2. It is 
important to note that, although these representations allocate the samples in a different 
way, they represent the same pdf. The difference lays in the way that they exploit the 
duality between number of samples and weights to represent a density function. 

The transitions between Figures If) and g) and between Figures lg) and h) show 
the final two steps of the algorithm. These are equivalent to the sampling and weighting 
steps of the regular particle filter, but carrying the weights obtained in the importance 
re-sampling step. Figure lh) shows the final estimate of the posterior at time t. In con- 
trast to the representation of the posterior given by the regular particle filter (Figure 
Id)), the reallocation of the samples toward cluster 2 increases the efficiency of the rep- 
resentation. This is observed by the even distribution of the gray level intensities of the 
importance weights. 

In the previous algorithm, the overlapping with the first three steps of the regular 
particle filter provides a convenient way to perform an online evaluation of the benefits 
of updating the dynamic prior with the last observation. Even though in cases of a 
poor match between the dynamic prior and the posterior distribution the updating of 
the dynamic prior can be beneficial, in cases where these distributions agree, the extra 
processing of updating the dynamic prior does not offer a real advantage, and should 
thus be avoided. To our current knowledge, this issue has not been addressed before. 

The basic idea is to run the regular particle filter, evaluating at the same time the 
efficiency in the allocation of the samples. If the efficiency is low, the algorithm uses the 
estimate of p(xt~ i /y t ) given by the regular particle filter as the importance function to 
update the dynamic prior. The intuition behind this idea is to quantify at each iteration 
of the particle filter the trade-off between continuing to draw samples from a known 
but potentially inefficient importance function versus incurring the cost of building a 
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Fig. 2. Tracking results for the ball and the left side child for frame 1, 5, and 14. The bounding 
boxes correspond to the most probable hypotheses in the sample set used to estimate the posterior 
distributions. 

new importance function that provides a better allocation of the samples. The important 
observation is that, once the regular particle filter reaches an adequate estimate, it can 
be used to estimate both the posterior distribution p(xt / Vt ) and the updated importance 
function p{x t ~i/yt), which is the key to avoid adding a significant computational load. 

Considering that the efficiency of the sample allocation depends on how well the 
dynamic prior resembles the posterior distribution, an estimation of the distance be- 
tween these two distributions is a suitable index to quantify the effectiveness of the 
propagation step. We found [5] a convenient way to estimate the Kullback-Leibler di- 
vergence (KL-divergence) between these distributions, and in general between a target 
distribution p(x) and an importance function q(x) : 

KL{p{x),q{x)) « log(TV) - H{wi). (9) 

Equation (9) has a intuitive interpretation. It states that for a large number of particles, 
the KL-divergence between the dynamic prior and the posterior distributions measures 
how distant the entropy of the distribution of the weights, H ( Wi ), is from being uniform. 

4 Application 

To illustrate the advantages of our method we use a set of frames of a video sequence 
consisting of two children playing with a ball. In this case, the goal is to keep track of 
the positions of the ball and the left side child. Each hypothesis about the position of a 
target is given by a bounding box defined by height, width, and the coordinates of its 
center. The motion model used for the implementation of the particle filter corresponds 
to a Gaussian function of zero mean and standard deviations of 20 for the center of each 
hypothesis and 0.5 for its width and height. 

Figure 2 shows the results of tracking the targets using a regular version of the 
particle filter that includes an adaptive selection of the number of particles needed to 
achieve a successful tracking with a specified confidence level [5]. The labels over the 
bounding boxes correspond to the visual algorithms used to track the targets (see [5] for 
more details). 

The results of the tracking show that to track the child the algorithm needs a roughly 
constant number of particles during the entire sequence. This is explained because the 
child has only a small and slow motion around a center position during the entire se- 
quence. Therefore the stationary Gaussian motion model is highly accurate and there is 
not a real advantage of improving the propagation function. 
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Fig. 3. Number of particles used at each iteration to track the ball. Left: without adapting the 
importance function. Right: Adapting the importance function. 



In the case of the ball, the situation is different, since the number of particles needed 
to achieve the desired error level has a large increment during the period that the ball 
travels from one child to the other (Frames 3 to 7). During this period the ball has a 
large and fast motion, therefore the Gaussian motion model is a poor approximation of 
the real motion. As a consequence there is a large mismatch between the dynamic prior 
and the posterior distribution. This produces an inefficient allocation of the samples and 
the estimate needs a larger set of samples to populate the relevant parts of the posterior. 
Figure 3-left shows the number of particles needed to estimate the posterior distribution 
of the ball at each frame, to achieve a certain predefined level of accuracy [5]. 

Applying the modified version of the particle filter and setting a suitable value for 
the threshold on the KL-divergence, the tracking engine decides to adapt the importance 
function at all the frames where the ball travels from one child to the other (Frames 3- 
7). Figure 3-right shows the number of particles needed to achieve a successful tracking 
of the ball with the same specified confidence level used before, but adapting the im- 
portance function. Comparing with the non-adaptive case, during Frames 3 to 7 it is 
possible to observe a significant reduction in the number of samples due to a better 
allocation of them. 

Figure 4 compares the location of the resulting set of samples to estimate the pos- 
terior distribution of the position of the ball at Frame 5. For clarity only the (x,y) 
coordinates of the center of each hypothesis are shown in the graphs. In the case of no 
adaptation the mismatch between the dynamic prior and the likelihood produces many 
wasted particles allocated in the tails of the posterior distribution. In contrast, in the 
adaptive case the use of the current observation produces a re-allocation of the samples 
towards areas of high likelihood, reducing the number of wasted samples. 



5 Conclusions 

In this paper we presented a method to adaptively propagate the set of samples used 
by the particle filter. The method can be added to the regular particle filter as an extra 
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Fig. 4. Estimate of the posterior distribution of the position of the center of the ball at Frame 5. 
Left: without adapting the importance function. Right: adapting the importance function. 



step without significant computational overload. In contrast to previous algorithms, our 
method includes the construction of a suitable importance function and a mechanism to 
identify when the adaptation of the importance function may be beneficial. To achieve 
this last goal, we use an estimation of the KL-divergence between the dynamic prior 
and the posterior distribution. The results of testing the new method for tracking targets 
in real video sequences shows the advantages of the adaptive version with respect to 
the regular particle filter. Using the adaptive version it was possible to efficiently track 
targets with different motions using a highly general motion model. 
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Abstract. In decision support systems for Intensive Care Units (ICU), the data 
management subsystem plays an essential role since the data have a heteroge- 
neous origin. The temporal dimension of the data is also very important in cap- 
turing the intrinsic dynamism in patients’ evolution data. This situation requires 
the integration of data in a unique platform in which time representation and 
management techniques should be considered. The selection of a data model that 
simplifies the expression of (complex) queries relies on an efficient internal rep- 
resentation of data for processing updates and queries on temporal data. On the 
other hand, due to the large amount of data regarding patient evolution a DataBase 
Management Systems (DBMS) is required. Therefore, the integration of a DBMS 
with a temporal reasoner is required if temporal reasoning capabilities on pa- 
tients’ evolution data are to be provided. This paper presents the integration of a 
DBMS with a generic fuzzy temporal reasoning (FuzzyTIME). 



1 Introduction 

In recent years, static consultation systems in medical domains have been replaced by 
systems that can deal with the temporal dimension needed to capture the patient’s evolu- 
tion over time, as in the area of decision support systems in Intensive Care Units (ICU) 
[1]. Therefore, the inclusion of modules which make temporal reasoning on patients’ 
evolution data possible is essential. An example of this modules is FuzzyTIME [2] a 
generic temporal reasoner based on Fuzzy Temporal Constraint Network (FTCN) for- 
malism [3], which can be easily integrated into any application that requires managing 
temporal information. 

From the temporal reasoning perspective, temporal knowledge, usually captured by 
a constraint network, can be represented more effectively if the network is comple- 
mented by a database for storing the information typically associated to label the nodes 
of the network [4] . 

A decision support systems for the ICU domain has to deal with a large amount of 
data provided by both the signals monitored and clinical history data. In such a context, 
the integration of databases in these systems is essential. 

There have been several approaches to the problem of integrating temporal informa- 
tion and databases, from the point of view of both artificial intelligence and databases. 
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From the point of view of database technology, the introduction of time information 
into a database can be carried out in several ways, although the most usual is the in- 
troduction of a time stamp attribute with some fixed granularity [5], This attribute can 
be managed explicitly by the database management system, so the database becomes a 
temporal database, defined as a collection of facts associated with one or more temporal 
contexts. In this line of work, TSQL2 [6] has an implicit time model in which the time 
for the tuples is not specified and temporal consistency is guaranteed. The main prob- 
lem with this approach lies in how to solve queries in which the exact absolute time is 
not specified in the database [7], In decision support systems for ICU in diagnosis it is 
necessary to deal explicitly with qualitative and quantitative temporal constraints and 
this kind of solution is not, therefore, well adapted to the problem. 

It is not unusual to find information systems in which absolute time is managed, but 
what is unusual is to find systems that manage qualitative and quantitative constraints. 
Some attemps are currently being made to apply qualitative temporal constraints to 
databases in constraint and relational database [8, 9], This kind of reasoning is essential, 
for example, in a temporal abstraction process (an important task in medical domains) 
like the one described in [10], where there is an explicit treatment of qualitative relations 
and time is considered to be a variable. A temporal reasoner must be able to infer 
temporal relations, that is, deal with date arithmetic, temporal relations and temporal 
granularities. 

Brusoni, in LATER [5], gives a solution to this problem, but he deals with it from 
the database perspective: he proposes a redefinition of the relational algebra operators 
for managing qualitative temporal constraints between tuples. In this work, we present a 
model for the integration of a database with a general purpose temporal reasoner, Fuzzy- 
TIME, in order to enable mechanisms of temporal reasoning on the elements stored in a 
database. The module resulting from this integration is the core subsystem of ACUDES 
[11], a general purpose architecture for decision support systems for ICUs which pro- 
vides temporal reasoning capabilities. Moreover, Fuzzy TIME uses Possibility Theory 
for solving queries about necessity and possibility, by means of a fuzzy extension of 
classic modal operators MAY and MUST, which allows us to obtain a value between 
0 and 1 as the answer for a temporal query. In our proposal, the temporal reasoning 
module is plugged on top of the DataBase Management Systems (DBMS), contributing 
thus to the treatment of imprecision and uncertainty (since temporal constraints used in 
the reasoner are fuzzy). 

The rest of the paper is structured as follow: A concise description of FuzyTIME 
temporal reasoner is given in section 2. The structure of the database as well as the 
different type of temporal information considered is put forward in section 3. In the 
same section, the integration of FuzzyTIME with the database is introduced, where 
special attention is placed on the kind of queries than can be solved by the system. 
Finally, some conclusions and discussion are presented. 



2 Temporal Reasoning Module: FuzzyTIME 

FuzzyTIME is based on a three-layered architecture, which allows us to separate the 
interface for querying and updating temporal information (interface layer) from the 
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layer where temporal entities and relations are managed (temporal world layer), and 
from their low level representation (FTCN layer). An expressive language which allows 
the formulation of complex queries involving disjunctions of relations is provided in the 
upper layer. The proposed language is an extension of the one presented in [12]. 

The second layer is called temporal world and contains a high level representation of 
temporal entities and relations. In FuzzyTIME two kinds of temporal entities have been 
considered: time instants and intervals (a time interval is decomposed into a ordered pair 
of two points). Time entities can be related to each other by means of both qualitative 
and quantitative relations: Qualitative point-to-point, qualitative point-to-interval (both 
formalised by Van Beek [ 13]), Allen qualitative interval-to-interval [ 14] and quantita- 
tive point-to-point [3]. The temporal relations allowed in the temporal reasoner are only 
convex disjunction, thus obtaining a trade off between expressive power and efficiency. 
In this level, convexity is checked at the same time as a translation from the high level 
relations of the language to metric point-to-point relations is produced. 

The third layer, which contains the low level representation of the temporal entities 
and relations, is based on the FTCN (Fuzzy Temporal Constraint Network) [3] formal- 
ism. The representation is a graph composed of nodes (temporal variables representing 
time points) and fuzzy metric constraints between nodes (fuzzy numbers representing 
the temporal distance between two points). A minimal network that represents the min- 
imal domains for temporal variables is calculated here. This network and the use of a 
local propagation mechanism (as in LATER [5]) allows us to achieve efficient query 
answering. The use of fuzzy numbers as constraints allows us to make use of the Pos- 
sibility Theory to solve queries about necessity and possibility, by means of a fuzzy 
extension of classic modal operators MAY (17) and MUST (N) thus obtaining a real 
value between zero and one as a result of a query. 

To illustrate the syntax of the expressions and the translation process we provide 
a simple example. Both the assertions and the queries follow the same base format, 
(TemporalEntity TemporalConstraint TemporalEntity). For example, let us suppose we 
write the following expression: (IntervalA BEFORE EQUALS PointB); this relation can 
be translated into a pair of constraints: (PointBeginA BEFORE EQUALS PointB) and 
(PointEndA BEFORE EQUALS PointB) where PointBeginA and PointBeginB are the be- 
ginning and the end of IntervalA. As a second step, QUAN operator [15] is used to 
translate the qualitative relation BEFORE EQUALS to a fuzzy number represented by 
means of a trapezoidal possibility distribution (Figure 1). In this case, the BEFORE 
constraint is represented as (1, 1, + 00 , + 00 , 1) and the EQUALS constraint is (0, 0, 0, 0, 
1), thus the union of both constraints is (0, 0, + 00 , + 00 , 1). 



3 Database Structure 

Without losing generality, we can say that data may be temporal or atemporal, depend- 
ing on whether they are associated to time entities or not. In the domain of our applica- 
tion (ICU), the system’s structure must be adapted to the different sources of informa- 
tion, mainly those provided by monitored signals and patients’ clinical data. Atemporal 
data are used to represent information about the patients’ history, without any specific 
relation to time, for example, age or sex. 
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Fig. 1. Possibility distribution associated to the fuzzy number (a,b,c,d,h). 
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Fig. 2. Elements of the temporal abstraction process. 



Temporal data include a time specification which can be specified by an absolute 
date or a temporal relation (by means of any expression allowed in the temporal lan- 
guage). As regards its temporal nature, information to be stored must belong to one of 
the following three main categories: observations -concrete measurements taken on any 
observable patient feature-, states -produced by the temporal abstraction process over 
the set of observations and which represent a time interval in which the value of a fea- 
ture does not change-, and events -representing the begining and the end of a state- (for 
the relation of the structures see Figure 2). A conventional approach [10] is followed 
for clinical history data abstraction and the technique described in [16] is applied for 
signal abstraction. 

The different tables in the database have been designed according to the previous 
categories of temporal concepts. Thus, three tables are considered: observations, states 
and events. The basic structure of tuples is comprised of three components: concept, 
attribute, and value. These elements correspond to the concept being dealt with, the 
name of one attribute belonging to the concept, and the value of that attribute. Obser- 
vation tuples include an absolute date, indicating the time at which the observation was 
taken, whereas a temporal reference (a reference to an entity already present in the rea- 
soner) is associated to the tuples in the state and events tables. This database structure 
constitutes a general structure that can be extended with any attributes imposed by the 
application domain. FuzzyTIME is perfectly adapted to these structures since it allows 
the application to deal with different temporal entities -points and intervals- and with 
both qualitative and quantitative information, including absolute dates and temporal 
constraints. 



A Model for Fuzzy Temporal Reasoning on a Database 



61 



A graphical representation of the interrelation between the database and the inter- 
nal representation of relations, points and intervals is shown in Figure 3. The temporal 
world layer is represented in the middle of the figure; a high level representation of enti- 
ties (intervals like S and points like El or E2 in the figure) and relations (point-to-point 
either metric or qualitative, point-to-interval, interval-to-point and interval-to-interval) 
is maintained in that layer. This temporal world has a low level representation in the 
FTCN layer, where intervals are decomposed into an ordered pair of points and where 
qualitative relations are translated into metric ones. All the data relating to observations, 
states and events are stored in the database and can be retrieved with a reference that 
has a counterpart in the entities of the temporal world. 
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Fig. 3. Integration of FuzzyTIME with the database. 



3.1 Data Updates 

There are two kinds of updating operations: temporal and atemporal. First of all, seman- 
tic consistency must be checked against the domain ontology for both kinds of update. 
In our concrete case, the architecture includes an ontology server that offers the ICU 
ontology to the rest of modules. Once the semantic consistency is checked, the updates 
can be raalised. In the case of atemporal updates, the elements can be introduced into 
the database with a simple SQL sentence. 

Additionally, temporal consistency with the data already stored must be checked for 
new temporal updates; this operation is done automatically by FuzzyTIME. Depending 
on the kind of temporal information, two different cases can be found: updating an 
absolute date or updating information on temporal variables. In the former case, for 
example in the case of observations, consistency checking is not necessary. The tem- 
poral information is asserted in FuzzyTIME and the temporal variable created by that 
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insertion can be used as identifier for the temporal variable associated to the tuple in 
the database. For example, the expression OBSERVATION (pain, localisation, precor- 
dial) BEFORE 8:00 AM is asserted as such. In the case of having references to already 
defined temporal variables, the first step consist of retrieving these variables, the con- 
sistency checking must be performed and, finally, the tuple has to be inserted into the 
database if the temporal information to be asserted is consistent with that previously 
stored in the database. 

3.2 Concept History Functions 

Once the structure of the database has been designed, original language provided in 
FuzzyTIME, which is strictly temporal, must be extended in order to access the data 
stored in the database. To begin with, the definition of basic operations -like LAST, 
FIRST, NEXT, PREVIOUS, and NTH- is needed to browse data in a single concept his- 
tory. Note that a topological order is established on data belonging to the same history, 
but this is not the case for data corresponding to different concept histories. 

The argument for the FIRST and LAST functions is a tuple. This tuple can take 
several forms, e.g. wildcards: the tuple (concept, attribute, *) returns the first/last event 
(observation, or state) of the history associated with attribute with any value; or a list of 
values: the tuple (concept, attribute, NOT {Vi, ..., v„}) returns the first/last event (obser- 
vation, or state) of the history whose values do not match with any of the specified ones. 
With these functions, values matching expressions like LAST OBSERVATION (pain, in- 
tensity, NOT {high}) will retrieve the last observation of pain whose intensity is not 
high, i.e., either moderate or low. 

The remaining functions, NEXT, PREVIOUS and NTH, allow the user to go through 
a history of events, states or observations, and can also be used in several ways: 

- [NEXT | PREVIOUS] (concept, attribute, value, reference) returns following/ 
previous event (observation, or state) of (concept, attribute, value, reference) in the 
history. 

- NEXT and PREVIOUS can also be applied to the result of FIRST and LAST func- 
tions. 

- NTH (concept, attribute, value) returns the event, state, or observation in the nth 
position at the history list. 

All the previously described functions return a database tuple that can be used in the 
query already defined in the temporal reasoning module. These functions can return a 
UNKW (unknown) value in the case of the concept being queried is undefined. With 
these simple functions, the user can go through a history in a simple loop with three 
steps: (1) retrieve the first tuple of the set, (2) establish the condition of the loop “until 
unkw is returned”, (3) retrieve the next tuple by means of the next function. 

3.3 Temporal Queries 

In the first instance, the temporal reasoner only accepts temporal queries [2], so we 
have extended the language to cope with the operations imposed by its integration in 
the database. As well as the modal operators, i.e. necessity and possibility, universal 
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and existential quantifiers have been introduced. These quantifiers allow us to deal with 
multiple appearances produced as result of a query to the database. 

Queries, in their most basic form, are comprised of two operands and a temporal 
relation, -one of those defined in FuzzyTIME-, and can be classified according to the 
type of the operands. In the so-called level 0 queries, only the temporal entities (points 
and intervals) that have already been defined in the module for temporal reasoning can 
be included. The schema of this kind of query is: 

- [MAY | MUST] (concept, attribute, value, reference) constraint [entity | date | (con- 
cept, attribute, value, reference)] ) 

Thus, the first operand is any of the tuples mentioned (observations, states or events) 
plus a temporal reference; whereas the second operand can be either a tuple, with the 
same structure as the first operand, or the identifier of an entity or an absolute date. Both 
qualitative and qualitative relations are allowed, depending of the type of the operand. 

This kind of query is directly translated into FuzzyTIME queries by means of the 
temporal references included in the database tuples, and the result of the queries will 
be a degree of possibility or necessity of the query. It has to be taken into account that 
any element with an associated temporal reference can be substituted by any function 
of those specified in the previous section. 

For example, an expression like “has the patient suffered the last strong pain within 
the last days before admission?” can be written as MUST (LAST OBSERVATION (pain, 
intensity, high) LESS.THAN 3 DAYS BEFORE ADMISSION)). The next step is to re- 
trieve the tuples that are compliant with the given values in the tuple (c, a, v) from the 
database; the second operand is the entity called ADMISSION, which corresponds with a 
time point already defined. These values are retrieved via a simple SQL query, such us, 
SELECT (concept, attribute, value, reference) FROM observation WHERE concept=’pain’ and 
attribute=’intensity’ and value=’high', which returns a result set from the database. Having 
this result set, it is easy to find the entity that matches the temporal reference for each 
one of the valid tuples. 

In the second type of queries, called level 1 queries, the first operand may be any 
tuple (observations, states or events) without a temporal relation; as second operand, 
any of the already defined temporal entities, or an absolute date, or tuple with a tem- 
poral reference can be used. In this case, since the query is extended over a subset of 
the temporal elements that comprises the history of the event, observation, or state on 
which the query is performed on, the universal and existential quantifiers can be used 
in conjunction with the modal operators MAY and MUST. This kind of queries can be 
formalized as follow: 

- [MAY j MUST] ([FORALL [ EXISTS] (concept, attribute, value) constraint [entity 
date | (concept, attribute, value, reference)]) 

As in the previous case, elements associated with a temporal reference (states, events, 
or observations) can be replaced by any of the functions described in section 3.2. Solv- 
ing these queries involves an intermediate step in the translation of the query into the 
FuzzyTIME language since special attention must be given to quantifiers. The universal 
quantifier is translated into a conjunction of queries. For example, to solve the follow- 
ing query “has the patient suffered all strong pains a few days before admission?”, 
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rewritten as MUST (FORALL OBSERVATION(pain, intensity, high) LESS.THAN 3 DAYS 
BEFORE ADMISSION)), the temporal references of all the tuples matching the OBSER- 
VATION predicate are retrieved. Let {t 1 , ■ ■ ■ , t " } be these variables, the previous query 
can be translated into N(f\™ =1 (ti (LESSATHAN 3 DAYS) t a dmission ) which can 
be directly solved by FuzzyTIME. Queries involving existential quantifiers are trans- 
lated into a disjunction of basic queries instead of a conjunction. 

4 Conclusions 

In this paper we have dealt with the integration of a general purpose module for tem- 
poral reasoning, FuzzyTIME, with a database where the domain information is stored. 
The extensible architecture of FuzzyTIME allows a seamless integration with any other 
module, in this case a database. The combined work of these components is necessary 
because both form part of a whole architecture, ACUDES [11], which benefit from (1) 
the major features of the temporal reasoner, such as the ability to deal with qualitative 
and quantitative temporal constraints and the efficient query answering process, and (2) 
the ability of the database manager for managing large amounts of data. 

In the context of a decision support system in ICUs, more concretely in diagnosis, 
which is the domain where this system is applied, it is necessary to deal with temporal 
constraints. Therefore, a simple temporal extension of SQL like TSQL2 [6], which uses 
a timestamp column that represents a valid time for a tuple, is neither expressive not 
powerful enough to cope with the problem. 

In the solution proposed in [9], the authors extend the relational model by redefin- 
ing algebraic operators to deal with time and with (non fuzzy) qualitative temporal 
constraints between tuples. On the other hand, in [8] there is a theoretical approach 
to deal with indefinite temporal information on databases, but, again, only qualitative 
constraints are considered. 

In our proposal, a specialised temporal manager is integrated on top of the database, 
and the original language has been extended to interact with a general structure for 
database tables. The result is a system able to perform operations over qualitative or 
quantitative constraints, such as asserting new temporal constrains, checking the con- 
sistency of constraints, and inferring new temporal constraints. Another contribution of 
our work is the ability to use the possibility theory modal operators for querying the 
database. 

As regards the query language and the interaction with database. Section 3.2 pro- 
vides basic functions for browsing concept histories and for retrieving specific occur- 
rence. Furthermore, the kind of queries formerly allowed by FuzzyTIME has also been 
extended to take advantage of these new functions, making it possible to include exis- 
tential and universal quantifiers in queries involving both temporal and atemporal infor- 
mation. 
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Abstract. In this work we develop a logic for formalizing qualitative reasoning. 
This type of reasoning is generally used, for instance, when one has a lot of data 
from a real world example but the complexity of the numerical model suggests a 
qualitative (instead of quantitative) approach. 



1 Introduction 

When working with a real world problem one often encounters a lack of quantitative 
(numerical) information among the observed facts. A possible solution to this absence 
of information is simply to develop methods for reasoning under an incompletely speci- 
fied environment, and logic methods have been applied to give rise to reasoning schemes 
for fuzzy, imprecise and missing information. 

A different approach is to apply ideas from qualitative reasoning and, specifically, 
order of magnitude reasoning (OMR) introduced in [7] and later extended in [2-4, 9, 
11]. The underlying idea is that by reasoning in terms of qualitative ranges of variables, 
as opposed to precise numerical values, it is possible to compute information about the 
behavior of a system with very little information about the system and without doing 
expensive numerical simulation. 

Qualitative reasoning works with continuous magnitudes by means of a discretiza- 
tion so that it is possible to distinguish all the relevant aspects required by the con- 
text/specification (and only these aspects). 

The basis of OMR systems is computing with a set of coarse values, usually gener- 
ated as abstract representations of precise values. This is of course the same approach 
taken by any qualitative reasoning system. The distinctive feature of OMR is that the 
coarse values are generally of different order of magnitude. 

Depending on the way the coarse values are defined, different OMR calculi can be 
generated: It is usual to distinguish between Absolute Order of Magnitude (AOM) and 
Relative Order of Magnitude (ROM) models. The former is represented by a partition of 
the real line, in which each element of R. belongs to a qualitative class. The latter type 
introduces a family of binary order-of-magnitude relations which establish different 
comparison relations between numbers. This can be illustrated by means of several 
important examples. 

In [7] and extensions such as [2-4], coarse values are defined by means of ordering 
relations that express the distance between coarse values on a totally ordered domain 
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in relation to the range they cover on that domain. Specifically, the seminal paper [7], 
distinguishes three types of qualitative relations, such as x is close to y, or x is negligible 
w.r.t. y or x is comparable to y: later on, some extensions were proposed in order to 
improve the original one with the inclusion of quantitative information, and allow for 
the control of the inference process [2-4], 

There exist attempts to integrate both approaches as well, so that an absolute parti- 
tion is combined with a set of comparison relations between real numbers [9, 11]. For 
instance, it is customary to divide the real line in seven equivalence classes and use the 
following labels to denote these equivalence classes of K: 

NL NM NS PS PM PL 

1 | 1 

-|3 -a 0 a 

The labels correspond to “negative large”, “negative medium”, “negative small”, 
“zero”, “positive small”, “positive medium” and “positive large”, respectively. The real 
numbers a and (3 are the landmarks used to delimit the equivalence classes (the partic- 
ular criteria to choose these numbers would depend on the application in mind). In [9] 
three binary relations ( close to, comparable, negligible ) were defined in the spirit of [7], 
but using the labels corresponding to quantitative values, and preserving coherence be- 
tween the relative model they define and the absolute model in which they are defined. 

Our aim in this paper is to develop a non-classical logic for handling qualitative 
reasoning with orders of magnitude. To the best of our knowledge, no formal logic 
has been developed to deal with order-of-magnitude reasoning. However, non-classical 
logics have been used as a support of qualitative reasoning in several ways: For in- 
stance, in [12, 10] is remarkable the role of multimodal logics to deal with qualitative 
spatio-temporal representations, and in [8] branching temporal logics have been used to 
describe the possible solutions of ordinary differential equations when we have limited 
information about a system. 

In this paper, as a starting point of our proposal, we will use an arbitrary set of 
real numbers, not necessarily all the real line, partitioned in equivalence classes: three 
classes formed by so-called obsen’able numbers (positive or negative) and non-obser- 
vable numbers or infinitesimals 1 (including 0). In the class of infinitesimals we will 
not distinguish between positive or negative. The landmarks are defined by a pair of 
numbers a + and a~, and the equivalence classes are denoted as follows: 

- OBS + (positive observable include a + ) 

- OBS~ (negative observable include a - ) 

- INF (infinitesimals) 

OBS~ INF OBS + 

«“ a + 

Once we have the equivalence classes in the real line, we can make comparisons 
between numbers by using binary relations such as 

1 This means elements too small to be observed. Not to be confused with the formal meaning in 
a hyper-real framework. 
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- x is less than y, in symbols x < y 

- x is less than and comparable to y, in symbols x d y. 

where d is a restriction of the usual order of the real numbers (<) to numbers belonging 
to the same equivalence class. 

We will introduce a minimal system to handle orders of magnitude based on the 
proposed approach, whose linear ordering will then be extended to Q and, finally, to R. 

In our syntax we will consider the operators □ and □ to deal with the usual or- 
dering <, and the operators ■ and ■ to deal with C. The intuitive meanings of each 
modal operator is as follows: 

□ A means A is true for all number greater than the current one. 

■ A means A is true for all number greater than and comparable to the current one. 

□ A means A is true for all number less than the current one. 

■ A means A is true for all number less than and comparable to the current one. 

Although the treatment presented in this work is considerably simpler than those 
stated at the beginning of this section, still it is useful as a stepping stone for considering 
more complex systems, for which the logic has to be enriched by adding new modal 
operators capable to treat a bigger number of milestones, equivalence classes and/or 
qualitative relations. 

This paper is organized as follows: In Section 2 the syntax and the semantics of the 
proposed logic is introduced; in Section 3 then a minimal axiom system is presented, 
whic axiomatizes validity in frames with an arbitrary set of real numbers is defined, then 
some extensions dealing with Q or R are given. In Section 4 the completeness proof is 
given, following a Henkin-style. Finally, in Section 5 some conclusions are drawn and 
prospects for future work are presented. 



2 Syntax and Semantics of the Language C(MQ ) 

The syntax of our initial language for qualitative reasoning is introduced below: 

The alphabet of the language C(AIQ) is defined by using: 

- A stock of atoms or propositional variables, V. 

- The classical connectives A, V and — > and the constants T and _L. 

- The unary modal connectives and □ , □ , ■ and ■ . 

- The constants a + and a~ 

- The auxiliary symbols: (, ). 

Formulas are generated from V U { a + , a ~ , T, _L} by the construction rules of clas- 
sical propositional logic adding the following rule: If A is a formula, then so are DA, 
□ A, ■ A and ■ A. The mirror image of A is the result of replacing in A each occur- 
rence of □, □, ■, ■, a + , ad by □, □, ■, ■, a~ , a + , respectively. We shall use 
the symbols 0 , 0 , ♦ and ♦ as abbreviations respectively of — > □ — i, — > □ — ->■-> and 
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Definition 1. A multimodal qualitative frame for C(MQ) (or, simply, a frame) is a 
tuple £ = (S, +a, —a, <), where 

1. S is a nonempty set of real numbers. 

2. < is a strict linear order on S. 

3. +a and — a are designated points in § (called frame constants), and allow to form 
the sets OBS + , INF, and OBS~ defined below: 

OBS~ = {x £ S | x < —a} INF = {x £ S | — a < x < +a} 
OBS + = {x £ S j +a < xj 

We will use x C y as an abbreviation of “x < y and x, y £ EQ, where EQ £ 
{OBS + , INF, OBS~y\ 

Definition 2. Let £ be a multimodal qualitative frame, a multimodal qualitative model 
on £ (or 17-model, for short) is an ordered pair A4 = (£, h), where h is a meaning 
function (or, interpretation) h : V — > 2 s . Any interpretation can be uniquely extended 
to the set of all formulas in C{MQ) (also denoted by h) by using the usual conditions 
for the classical boolean connectives and the constants T and _L, and the following 
conditions for the modal operators and frame constants: 

h{ □ A) = {x £ § | y £ h(A ) for all y such that x < y} 

h( ■ A) = {x £ S | y £ h(A) for all y such that x Cy} 

h{ □ A) = {x £ S | y £ h(A) for all y such that y < x} 

h{ ■ A) = {x £ S | y £ h(A) for all y such that y IZ x} 

h(a + ) = {+a} h(a ~ ) = {—ct} 

The concepts of truth and validity are defined in a straightforward manner. 



3 Axiomatic Systems for C ( M Q ) 

In this section we define several axiomatic systems for multimodal qualitative logic. A 
list of axiom schemes and inference rules are presented in order to build the different 
systems. We also consider all the tautologies of classical propositional logic. 

Axiom schemata for □ , Q : 

K1 n(A-*B)-> (DA -> OB) 

K2 A -> □ 

K3 DA — > □ □ A 

K4 ($AA~$B) -> (?(lAB)V?(0AAB)V0(iA0B)) 

K5 'nnA-^pA 
K6 DA -» f^A 

K7 (?AaOD-A) ?(D0AA D-.A)) 
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Axiom schema for ■ : 

Cl M(A -> B) — ► (MA -> MB) 

Mixed axiom: 

Ml UA^MA 

Axiom schemata for constants, where £ denotes an element of the set {a + , a~} 

cl 0£v£v"0£ 
c2 A □-£) 

c3 a - — > _^a + 
c4 or —> MA 

c5 ( 0 a - A 0 a + ) — B ( 0 cT A 0 a + ) 
c6 0a” — > ■(cr - V 0a“) 
c7 (a+ A BA) -> DA 
c8 MA -» □((«- V -> A) 

c9 (Vct+ A BA) -> DA 

clO (Va'A^a+A BA) -> □((O’oT A "0a + ) -»■ A) 

We also consider as axioms the corresponding mirror images. 

Rules of inference: 

(MP) Modus Ponens for — > 

(ND) If h A then h DA 
(ND) If h A then ADA 

Definition 3. The minimal system for C{MQ) is denoted MQ. It consists of the axioms 
given by K1-K4 plus Ml, Cl, cl—clO and the corresponding mirror images. MQq is 
the extension of MQ by adding K5, K6 and their mirror images. Finally, M Qr is the 
extension of MQq by adding K7 and its mirror image. 

The concepts of proof and theorem are defined in a standard way. 



4 Soundness and Completeness 

The proof of soundness is straightforward, since validity of the axioms and preservation 
of validity by inference rules is simply a standard calculation. Thus, we need only to 
focus on completeness, for which a Henkin-style proof can be constructed. 

The proof of completeness follows the step-by-step method as in [1]; therefore, 
some results about consistent ( maximal consistent ) sets of formulas are needed. Some 
familiarity with the basic properties of maximal consistent sets is assumed, we shall 
use A iC to denote the set of all maximal consistent sets of formulas ( mc-sets ) of any of 
the systems introduced in the previous section. We denote by _4<S any such axiomatic 
system. 
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Definition 4 . Let Ji, J2 G A4C. Then: 

1. J > T ‘2 if and only if {A | DAg fi} C f2 

2 . Jl ► T ‘2 if and only if {A | ■AgLiJCf^ 

The three lemmas below state some modal properties of the operators > and ► : the 
behaviour with respect to the relations just introduced, the transitivity and linearity of 
those orderings, and the existence of mc-sets with suitable properties. The statements 
only contain the behaviour of the specific (black) modal connectives, the usual (white) 
modalities have the same properties: 

Lemma 1. Let T i, J2 G A4C, then: 

1. ► J 2 if and only if {A | BA € J 2 } C Tj 

2 . Ji ► J2 if and only if { ♦ A \ A G J2} C Ji 

3. Ji ► J 2 if and only if { ♦ A \ A G Ji} C J 2 

4. (Lindenbaum’s Lemma ) Any consistent set of formulas in .45 can be extended to 
an mc-set in . 45 . 

Lemma 2. Consider Ji, J2, J3 € A1C, then 

1. If L 1 ► J2 nnt/ J2 ► J3, then Jl ► J3. 

2 . If J ► J2 a ml Ji ► J3, then either J2 ► J3, or J3 ► J2, or J2 = J3. 

3. If L 2 ► Ji and J 3 ► Ji, then either J 2 ► J3, or J3 ► J2, or J2 = J3. 

Lemma 3. Assume Jl € AfC: 

7 . // ♦ A G Jl, then there exists J2 G AfC snch thnt Jl ► J2 nnt/ A G J2. 

2 . If ♦ 4 . G Ji, then there exists J2 G A 1 C snch thot J2 ► Ji ont/ A G J2. 

The following two lemmas are specific of our logic, since the behaviour of specific 
and general connectives is studied. 

Lemma 4 . Consider Ji, J2 G AlC inch thof Jl [> J2, then Jl ► J2 hoh/s if and only 
if one of the following conditions below is fulfilled: 

1. { <> a - A a + , 0"a + , a - } l~l Ji n J 2 7 ^ 0 

2 . a+ G Ji 

3.0: G J2 

Lemma 5. Given Jl, J2, J3 G AfC we have: 

1. If r 1 ► J 2 , then Ji > J2 

2 . If Ji ► J2, Ji > J3 one/ it is not the case that Jl ► J3, then J2 > J3 

3. If J2 ► Ji, J3 > Ji and it is not the case that J3 ► Ji, then J3 > J2 

4. If J > J 2 > J3 and Jl ► J3, then Ji ► J2 ► J3 

We will sketch the proof of completeness by using the step-by-step method. The 
following definitions are needed in order to describe the construction method of each 
step in the proof. 

The construction is built upon the concept of pre-frame. 
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Definition 5. 

1. A pre-frame is a tuple obtained by eliminating either one or both frame constants 
from a frame, that is, a pre-frame can be of the following forms: either T = (S, <) 
or T = (S, +a, <) orT = (S, —a, <). 

2. Given a pre-frame T, a trace ofT is a function fr '■ § — * such that, for 

all x £ E>, the set fr(x) is a maximal consistent set. 

Definition 6. 

1. Given a frame £, a trace of £ is a function fz : S — > 2 £ ( M< 2) such that, for all 
x £ S, the set fz (x) is a maximal consistent set. 

2. Let fz be a trace of £ = (S, +a, —a, <). Then fz is called: 

- Coherent if it satisfies: 

(i) a + € fz(- l-a) and a~ € fs{—a) 

(ii) for all x, y £ S: 

(a) If x < y then f E ( x) > f s (y) 

(b) If x Gy, then f s {x) ► fz{y) 

- 0 -prophetic if it is coherent and for all formula A and all x £ S: 

if 0 A £ fz(x), there exists y such that x < y and A £ fz{y) (1) 

- ♦ -prophetic if it is coherent and for all formula A and all x £ S: 

if ♦ A £ fz(x), there exists y such that x G y and A £ fz{y) (2) 

- 0 -historic if it is coherent and for all formula A and all x £ §: 

if 0 A £ fz(x), there exists y such that y < x and A £ fz{y) (3) 

- ♦ -historic if it is coherent and for all formula A and all x £ §: 

if ♦ A £ f z{x ) , there exists y such that y G x and A £ fz{y) (4) 

- The conditional expression (1) (resp. (2), (3), (4)) is called a 0 -prophetic 
( resp. ♦ -prophetic, 0 -historic, ♦ -historic) conditional for fz wrt 0 A ( resp. 
♦ A, 0 A or ♦ A) and x. We also say in this case that such a conditional is 
simply a conditional for fz- 

fz is called prophetic if it is <0 -prophetic (or ♦- prophetic ) and it is called 
historic if it is 0 -historic (or ♦ -historic). 

3. Let fz be a trace of £ = (S, +a, —a, <). fz is called full if it is prophetic and 
historic. 
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Definition 7. 

1. Let S be a denumerable infinite set. We consider the class, Es, of finite frames, 
(S, +a, —ct, <), where § is a nonempty finite subset ofS. 

If Si = (Si, +cti, — cti, <i), £ 2 = (S 2 , +ct 2 , — a 2 , < 2 ) € Eg, we say that S 2 
is an extension of Si if the following conditions are satisfied: Si C S 2 , <1 C < 2 , 
+cti = +ai 2 , —cti = — a 2 . 

In a similar way we define that a pre-frame T\ is an extension of the pre-frame X 2 . 

2. Let fz be a trace of a frame S = (S, +a, —a, <) £ E, 5 . 

- Consider a Q -prophetic conditional for f% (with respect to <) A and x): 

if 0 A £ fs(x), there exists y £ S such that x < y and A £ fs(y) 

This conditional is said to be active, if 0 A £ fs(x) but there is no y such 
that x < y and A £ fz(y); otherwise, if there exists y such that x < y and 
A £ f s ( y ) the conditional is said to be exhausted 2 . 

- The definition of active and exhausted ♦ -prophetic conditional are given in a 
similar manner. 

- For conditionals of type historic the definitions are similar. 

Theorem 1 (Completeness Theorem). If A is a valid formula of L(MQ), then A is a 
theorem of MQ. 

Proof. The idea is to show that for any consistent formula A, it is possible to build 
a multimodal qualitative frame £ = (S, +ct, —a, <) and a full trace fs, such that 
A £ fs(x) for some x £ S. The frame £ will be the countable union of a sequence 
of finite frames, I7o, £ 1 , ■ . . , £ n , . . ., taken from the class 3; in Def 7. However, a 
preprocessing step is needed at the beginning of the construction; obviously, this pre- 
processing is of length at most two, since we have to guarantee the introduction of the 
frame constants +a and —a. 

Obtaining an Initial Frame. We define T 0 = (S',<') where S' = {.Xo}. <' = 0 
and the trace fr 0 is defined as fr 0 {xo ) = Co where To is a maximal consistent set 
containing A, which exists by Lindenbaum’s lemma. The next step depends on whether 
Xq is a frame constant (we take xq = — a if aT £ To or xq = +a if a + £ To) or not. 

On the one hand, assume that Xo = — a (the case Xo = +a is similar), then we have 
0 a + £ fs 0 (x 0 ). By Lemma 3(1) (with respect to white connectives) there exists T\ 
such that a + £ Pi. Now, we select the frame £0 = (§ 0 , <o) as follows: 

— So = { — cr, +a} 

- <0= {(— cr, + Q; )} 

and the corresponding trace is defined as f So = fy 0 U {(a + , Pi)}, which is clearly 
coherent. 

On the other hand, if we have aT Xq ^ a + , then we need to apply two steps as 
the previously described, one for introducing each frame constant. 

2 In other words, a conditional is said to be active if the conditional expression is not satisfied, 
whereas is said to be exhausted if the consequent is satisfied. 
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From the Initial Frame Onwards. Once we have an initial frame to work with, for 
the construction of E, we define an enumeration of elements in S = {xt \ i £ N} and 
an enumeration of formulas A<j,Ai, . . . , A n , ... of the language C(M Q). Therefore, a 
code number can be assigned to each prophetic (historic) conditional in the usual way. 

Assume that E n = (S n , <„) and ffi n are defined. Then, if no conditional is active, 
then E n+1 = E n , fs n+1 = fs n and the construction is finished. Otherwise, i.e., if there 
are prophetic (or historic) conditionals for fs n with respect to 0 A (respectively, 0 A, 
♦ A, ♦ A) and x, which are active, then we choose the conditional C with the lowest 
code number. By the exhausting lemma to be introduced later, there exists an extension 
E n+ \ = (S„+i, < n +i) € E$ of E n together with an extension fz n+1 of ffi n such that 
the conditional C for fs n+1 is exhausted. The trace of each finite frame is coherent, 
although in general, it fails to be either prophetic or historic. It can be proved that the 
final frame E, as defined, is such that ffi is full. Thus, A is verified by the trace lemma 
below. q.e.d. 

The lemmas used in the sketch of the proof of completeness are stated below. 

Lemma 6 (Trace Lemma). Let fs be a full trace of a multimodal qualitative frame E. 
Let h be an interpretation assigning each propositional variable, p, the set h{p) = {i€ 
S | p € fs(x)}. Then, for any formula. A, we have h(A) = {x £ S | A £ fs(x)}. 

Lemma 7 (Exhausting Lemma). Let E s be as in Definition 7, ffi n a coherent trace 
of a frame E n £ E$, and suppose that there is a prophetic ( historic ) conditional, C, 
for fxt n which is active. Then there is a frame E n+ \ £ E$ and a coherent trace fs n+1 , 
an extension of fs n , such that C is a conditional for fs n+1 which is exhausted. 

The completeness of systems MQq and M Qr are straightforward following [1]. 



5 Conclusions and Future Work 

A minimal multimodal language for the handling of qualitative reasoning has been in- 
troduced, a sound and complete system for multimodal qualitative reasoning has been 
presented, and its completeness theorem has been sketched. The importance of using a 
logical apparatus in the treatment of qualitative reasoning is the possibility of mecha- 
nization of its reasoning system. 

Obviously, this minimal language is still very poor in order to represent real-world 
interesting problems, for usually a greater number of landmarks are considered in AI 
applications of qualitative reasoning. As future work it is expected to extend the lan- 
guage, and automatize its proof procedure, namely: 

1 . Integrate further modalities expressed in terms of the Absolute and/or Relative Or- 
ders of Magnitude, such as closeness or negligibility, and considering a finer parti- 
tion of the real line. 

2. Develop a tableau calculus as proof procedure. 
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Abstract. The Partitioning problem is a key issue in the design of Distributed 
Virtual Environment (DVE) systems based on a server-network architecture. This 
problem consist of efficiently assigning the clients of the simulation (avatars) to 
the system servers. Despite the existing literature proposes different evolutive 
approaches for solving this NP-hard problem, an approach based on genetic al- 
gorithms is considered as the current best partitioning mechanism. 

In this paper, we analyze the impact of the low diversity of the initial popula- 
tion in this algorithm, and we propose a new mechanism for generating initial 
populations of higher quality. We also propose a new set of crossover methods 
oriented to problem specifications. Both improvements define a new genetic al- 
gorithm that provides better solutions than any other existing approach, in terms 
of both quality function and execution time. 



1 Introduction 

Distributed Virtual Environment (DVE) systems have experienced a spectacular growth 
last years. These systems allow multiple users, working on different computers that are 
interconnected through different networks (and even through Internet) to interact in a 
shared virtual world. This is achieved by rendering images of the environment as a user 
located at that point in the virtual environment would perceived them. Each user is rep- 
resented in the shared virtual environment by an entity called avatar , whose state is 
controlled by the user input. Since DVE systems support visual interactions between 
multiple avatars, every change in each avatar must be propagated to some avatars in the 
shared virtual environment. DVE systems are currently used in many different applica- 
tions [20], such as collaborative design [18], civil and military distributed training [12], 
e-learning [17] or multi-player games [1,7]. 

One of the key issues in the design of a scalable DVE system is the partitioning 
problem. It consists of efficiently assigning the workload (avatars) among different 
servers in the system [8]. The partitioning problem determines the overall performance 
of the DVE systems, since it has an effect not only on the workload that each server 
in the system supports, but also on the inter-server communications (and therefore on 

* Supported by the Spanish MCYT under Grants DPI-2002-04438-C02-02 and TIC-2003- 
08 154-C06-04. 
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the network traffic). Despite the partitioning problem in DVE systems has been usually 
addressed with ad-hoc procedures [20,9], recent works propose partitioning schemes 
following evolutive approaches [14-16]. One of these approaches, based on a genetic 
algorithm, offers good performances in terms of execution time and low values of qual- 
ity function. 

In this paper, we propose a set of improvements for this genetic approach in order to 
develop more scalable and cost-effective DVE systems. These improvements consists 
of maximizing the structural diversity of initial population and also of incorporating a 
new crossover mechanism. In this mechanism, each chromosome of the current popu- 
lation randomly chooses a crossover operation from a small set of oriented crossovers. 
Performance evaluation results show that, due to its ability of avoiding premature con- 
vergence of solutions, the proposed method can provide better solutions while requiring 
shorter execution time than other methods proposed in the literature. 

The rest of the paper is organized as follows: Section 2 describes the partitioning 
problem and the proposed techniques for solving it. Section 3 shows the implementation 
of the proposed genetic approach for solving the partitioning problem. Next, Section 4 
presents the performance evaluation of the proposed search method. Finally, Section 5 
presents some concluding remarks and future work to be done. 



2 The Partitioning Problem in DVE Systems 

Architectures based on networked servers are becoming a de-facto standard for DVE 
systems [20, 9, 15]. In these architectures, the control of the simulation relies on several 
interconnected servers. Multi-platform client computers are connected to one of these 
servers. When a client modifies an avatar, it also sends an updating message to its server, 
that in turn must propagate this message to other servers and clients. Servers must render 
different 3D models, perform positional updates of avatars and transfer control infor- 
mation among different clients. Thus, each new avatar represents an increasing in both 
the computational requirements of the application and also in the amount of network 
traffic. When the number of connected clients increases, the number of updating mes- 
sages must be limited in order to avoid a message outburst. In this sense, concepts like 
areas of influence (AOI) [20] or locales [2] have been proposed for limiting the number 
of neighboring avatars that a given avatar must communicate with. All these concepts 
define a neighborhood area for avatars, in such a way that a given avatar must notify 
his movements (by sending an updating message) only to those avatars located in that 
neighborhood. These avatars are denoted as neighbor avatars. 

Depending on their origin and destination avatars, messages in a DVE system can 
be intra-server or inter-server messages. Figure 1 shows an example of a DVE system 
consisting of several servers interconnected through a network. This figure also shows 
an example of both intra-server and inter-server avatar updating messages. In this figure, 
avatars are uniformly distributed and they are represented as dots. Each server manages 
a given number of clients (avatars) and decides which avatars are the destinations for 
the messages received from other avatars. This figure also shows an example of both 
intra-server and inter-server avatar updating messages. Inter-server messages are those 
messages whose origin and destination avatars are assigned to different servers. Other- 
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wise, the message is an intra-server message. In order to design scalable DVE systems, 
the number of inter-server messages must be minimized. Effectively, when clients send 
intra-server messages they only concern a single server. Therefore, they are minimizing 
the computing, storage and communication requirements for maintaining a consistent 
view of the virtual world to all avatars in a DVE system. 




AOI of avatars 



Fig. 1. Example of Multi-server DVE system 

Lui and Chan propose in [8] propose a quality function, denoted as C p , for evaluat- 
ing each assignment of clients to servers. This quality function takes into account two 
parameters. One of them consists of the computing workload generated by clients in 
the DVE system, denoted as . In order to minimize this parameter, the computing 
workload should be proportionally distributed among all the servers in the DVE sys- 
tem, according to the computing resources of each server. The other parameter of the 
quality function consists of the overall inter-server communication cost for sending the 
messages generated by all avatars, denoted as C p . In order to minimize this parameter, 
avatars sharing the same AOI should be assigned to the same server. Quality function 
C p is defined as 

Cp = W\ C™ + W 2 C£ ( 1 ) 

where W\ and W 2 are two coefficients that weight the relative importance of the com- 
putational and communication workload, respectively ( W\ + W 2 — 1). These coef- 
ficients should be tuned according to the specific features of each DVE system. Using 
this quality function (and assuming W\ — W 2 = 0.5) Lui and Chan propose an ad-hoc 
approach, called LOT, that re-assigns clients to servers [9]. The partitioning algorithm 
should be periodically executed for adapting the partition to the current state of the 
DVE system as it evolves (avatars can join or leave the DVE system at any time, and 
they can also move everywhere within the simulated virtual world). Lui and Chan also 
have proposed a testing platform for the performance evaluation of DVE systems, as 
well as a parallelization of the partitioning algorithm [9]. 

Since the partitioning problem in DVE systems can be considered as a a combina- 
torial optimization problem, different solutions based on metaheuristic techniques has 
been proposed [ 13-16]. Despite [13] describes a constructive strategy based on GRASP, 
the best partitioning results has been obtaining using evolutive techniques [15]. Among 
these evolutive techniques a genetic methods offer especially excellent results in terms 
of execution time and quality of the provided solutions. 
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Although the genetic approach proposed in [15] performs reasonably well, it is 
based on the generation of an initial population obtained by deriving a unique solu- 
tion provided by a k-means clustering algorithm [16]. Despite this feature offers an 
improved initial population of feasible solutions it, does not focus on maximizing the 
structural diversity of chromosomes. As described in [11] and [4], this low level of 
structural diversity can lead the algorithm to reach a local minimum or even a poorer 
approximation to this value. Additionally, the crossover mechanism used by this al- 
gorithm is based on an auto-fertilization technique, where chromosomes are derived 
following a single-point crossover [5]. This crossover mechanism is excessively gen- 
eralist, and it is possible to offer new crossover strategies more oriented to problem 
specifications. 

3 A New Genetic Technique for Solving the Partitioning Problem 

In order to improve the performance of our current approach for solving the partitioning 
problem in DVE systems, we propose a new version of the genetic algorithm. This 
new version focuses on maximizing the structural diversity of initial population and 
improving the crossover operator. The new algorithm replaces the current generation 
of the initial population with a new method based on random projections. Additionally, 
the crossover is performed by randomly choosing a mechanism from a list composed 
of five different crossover operators. All these five operators are very oriented to the 
specifications of partitioning problem in DVE systems. 

3.1 Generation of a Heuristic Initial Population 

Most of metaheuristic are based on the fast generation of an initial population of ele- 
ments [15, 11]. This initial population usually represents a set of poor solutions to the 
problem, and it is evolved through a crossover operator until a stopping criterion (for 
example, a given number of iterations) is reached. 

If the initial population has been correctly defined, then the metaheuristic algorithm 
easily obtains a good approximation to the global optimum. Moreover, as it is described 
in [11], if the initial population is not randomly selected then the algorithm should 
maintain a certain level of structural diversity among all the chromosomes, in order 
to properly represent the whole set of feasible solutions and to avoid the premature 
convergence of the search. 

Taking into account these considerations, we propose a new mechanism, called Pro- 
jections algorithm (PA), in order to generate the new population of initial solutions. This 
fast algorithm provides a set of independent and well-diversified initial solutions to the 
problem. 

PA consists of a given number of iterations n, that defines the number of chromo- 
somes in the population (population size). Each one of the n c chromosomes represents 
a complete partitioning solution to the problem where all the N avatars (,4 (l , .., ,4y_i ) 
in the DVE are assigned to the M servers (So, Sm- i) in the system. An iteration 

consist of four steps (for the sake of clearness, we will consider in the following de- 
scription that avatars move across a 2-D virtual world. The extrapolation to 3-D worlds 
is reasonably trivial), illustrated in Figure 2: 
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First, each avatar in the system is assigned to server So (Fig. 2a). Next, a random 
value 6 between 0 and 7t/2 is generated. Since all avatars are located on a Cartesian 
plane, PA draws a straight line which passes through the zero coordinates (0,0) with 
a slope of 6 radians (fig. 2b). This line and its perpendicular define a new coordinate 
axis that is rotated 0 radians with respect to the original position. Using a simple affine 
transformation (fig. 2c), the old coordinates (Xj . Yi) of each avatar can now be expressed 
with respect to the rotated axis by the new coordinates (X'. Y'). At this point, PA gener- 
ates two different binary search trees with X' and Y( search keys of each avatar. Once 
both trees are created, the N/M different avatars in both trees with the highest and the 
lowest keys are put into four different sets [19]. In order to assign a set of avatars to a 
server, the third step of PA evaluates separately the different sets of avatars using the C v 
function. Since both sets have the same cardinality N/M, PA algorithm only computes 
the Cp term. The term C p is not computed, since it evaluates the standard deviation 
of the assigned avatars with respect to the the perfect balancing. The last step is to se- 
lect the set with the lowest Cp value, and all avatars contained in this set are assigned 
to server Si (fig. 2d). At this point, N(M — 1 )/M avatars are still assigned to server 
So, and N/M avatars are assigned to server Si. Next iteration allows server So to lose 
another N/M avatars, which are assigned to S 2 . PA algorithm finishes when the last 
group of avatars (with a size less or equal to N/M avatars) is assigned to server Sm- i- 
At this point, a number of avatars very close to N/M are assigned to each server server 
in the DVE system. 




Fig. 2. Generation of the initial population based on Projection Algorithm 

It is important to mention that the use of 4 binary search trees allows to improve 
diversity when selecting the sets of avatars to be assigned to the target server for each 
iteration. The random generation of rotation angles guarantees inherently independent 
solutions in the generation of individuals of the initial population. Moreover, this pop- 
ulation allows the genetic approach to evolve solutions with good values of quality 
function C p . These good C p values are achieved by balancing the number of avatars 
among the servers and assigning to the same server the avatars that are closely located 
in the virtual scene. 



3.2 Providing Randomness to Chromosome Crossover 

In order to evolve chromosomes, the first approach for solving the partitioning problem 
in DVE systems, based on a genetic algorithm uses an auto-fertilization technique [16]. 
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In this technique, each chromosome generates a new chromosome following a single- 
point crossover with a probability equal to one [5]. When the crossover operator is 
applied to all individuals of the population ( the child population has been completely 
created) a elitist selection guarantees the survival of the best individuals. This crossover 
operator is excessively generalist and has been used often for solving different combi- 
natorial problems. 

The proposed technique is based on the random selection of crossover operators 
from a list. This list, which is accessed for each derivation, consists of five operators 
very oriented to problem specifications. The operator list consists of the following ele- 
ments: 

Operator 1. Random exchange of the current assignment for two border avatars. A 
given avatar A , is a border avatar if it is assigned to a certain server S r in the initial 
partition and any of the avatars in its AOI is assigned to a server different from S r 
[14]. 

Operator 2. Once a border avatar A , has been randomly selected, it is randomly as- 
signed to one of the servers Sf hosting the border avatars of A , . 

Operator 3. Besides the step described in the previous operator, if it exists an avatar 
Aj such that Aj is assigned to Sf and it is a neighbor avatar of A i: , then Aj is 
assigned to S r . 

Operator 4. Since each avatar generates a certain level of workload in the server where 
it is assigned to [8], then it is possible to sort the servers of a DVE system according 
to the level of workload they support. If S m and S n are the servers with the highest 
and the lowest level of workload in the system, respectively, then a random avatar 
Ak assigned to S m is assigned to S n . 

Operator 5. Besides the step described in the previous operator, a random avatar A;, 
initially assigned to S n , is now assigned to S rn . 



4 Performance Evaluation 

This section presents the performance evaluation results obtained with the proposed 
new genetic algorithm described in the previous section when it is used for solving the 
partitioning problem in DVE systems. Following the standard evaluation methodology 
described in [8] and used in [9. 13-16], we have empirically tested the new approach in 
two examples of a DVE system: a SMALL world, composed by 13 avatars and 3 servers, 
and a LARGE world, composed by 2500 avatars and 8 servers. We have considered 
two parameters: the value of the quality function C p for the partition provided by the 
proposed search method and also the computational cost, in terms of execution time, 
required by the search method in order to provide that partition. 

Our evaluation tool models the behavior of a generic DVE system with a server- 
network architecture on a real network of heterogeneous computers. Each server is im- 
plemented in a single PC, while up to 50 clients is allocated in the same PC. Following 
this configuration, a battery of DVE systems was tested. This battery was composed 
by 400 SMALL worlds and 300 LARGE worlds. We have used a 10 Mbps Ethernet as 
the interconnection network. The hardware platform used for the evaluation has been a 
Pentium IV at 1.7 GHz 256 Mbytes of RAM. The operating system was Windows 2000 
Professional operating system. 
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4.1 Tuning of Genetic Algorithm 

As described in [14, 16], the parameters of the genetic algorithm that should be tuned 
in order to achieve optimal performance are the population size, the number of genera- 
tions, and the mutation rate. 

Figure 3 shows the convergence of the proposed approach as the number of gen- 
erations and mutation rate vary in a LARGE DVE configuration. This convergence is 
expressed in terms of fitness function C p . Due to space limitations, the variation of the 
population size is not shown in this figure. The incidence of this parameter in the be- 
havior of the algorithm is very similar to the incidence of the number of generations in 
the same DVE system. 




60 120 180 240 300 360 420 480 540 600 660 

Number of generations 




Mutation rate (°/o) 



Fig. 3. C p values obtained for different amounts of generations and mutation rates in a LARGE 
DVE system 

Figure 3 shows (the graphic on the left) that for all the considered distributions, 
the C p value provided by the proposed algorithm decreases as the number of iterations 
increases, until a value of 400 iterations is reached. From that point, quality function 
C p slightly decreases or remain constant, depending on the considered distribution of 
avatars. The same behavior is shown for the population size when it reaches values close 
to 20 chromosomes. Although it is not shown here, values bigger than this number of it- 
erations require too much execution time and do not reach significantly better solutions 
in terms of C p . On other hand, a different behavior is observed when the mutation rate 
is varied (graphic on de right in figure 3). The values of quality function C p provided 
by the proposed algorithm decrease as this parameter grows, until the system explore 
values close to 8-9%. From these points, genetic algorithm starts to obtain worse so- 
lutions in term of C p . The reason for this behavior is that if an excessive number of 
mutations is performed for a given generation of individuals, then the evolving process 
of maintaining a set of high quality solutions is excessively degraded. 

Therefore, tuning of the selected parameters for the proposed genetic method and 
for the LARGE DVE systems has been 400 iterations, 20 chromosomes and a mutation 
rate of 9%. 

4.2 Evaluation Results 

For comparison purposes, we have evaluated the performance of the proposed approach, 
as well as the linear optimization technique (LOT) described in [9] and also the basic 
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genetic approach (BGA) presented in [16]. The latter method currently provides the 
best results for the partitioning problem in DVE systems. In the case of SMALL worlds 
we have also performed an exhaustive search through the solution space, obtaining the 
best partition as possible. Since this exhaustive method requires the exploration of a do- 
main composed by 3 13 (1.594.323) different solutions in SMALL worlds, this problem 
becomes unaffordable when LARGE worlds are considered (8 2500 different solutions). 
On other hand, since the performance of the heuristic search methods may heavily de- 
pend on the location of the avatars, we have considered three different distributions of 
avatars: uniform, skewed, and clustered distribution. 

Table 1 and table 2 show the evaluation results for a SMALL and LARGE vir- 
tual worlds, respectively. These tables show the C v values corresponding to the final 
partitions provided by two versions of proposed genetic approach for a SMALL and 
LARGE virtual worlds, and also the execution times required in order to obtain these 
final partitions. Additionally, they also show the same results obtained for BGA and 
LOT methods. 



Table 1. Results for a battery of SMALL DVE systems 





Uniform distribution 

Texe( ) Cp 


Skewed distribution 

Texe(S&C' ) Cp 


Clustered distribution 

Texe( ) Cp 


Exhaustive 


3.411 


6.54 


3.843 


7.04 


4.783 


7.91 


LOT 


0.0009 


6.56 


0.001 


8.41 


0.0011 


8.89 


BGA 


0.002 


6.54 


0.003 


7.04 


0.005 


7.91 


VI 


0.002 


6.54 


0.003 


7.041 


0.006 


7.91 


V2 


0.003 


6.54 


0.003 


7.04 


0.006 


7.91 



Table 2. Results for a battery of LARGE DVE systems 





Uniform distribution 

Texe(s@C') Cp 


Skewed distribution 

Texe(s@C‘) Cp 


Clustered distribution 

Cp 


LOT 


30.94 


1637.04 


32.18 


3460.52 


43.31 


5903.80 


BGA 


6.65 


1832.2 


13.79 


2825.6 


29.22 


4905.93 


VI 


6.24 


547.2 


14.05 


612.9 


28,74 


1002.51 


V2 


6.41 


321.3 


14.59 


450.8 


28.65 


791.94 



In order to evaluate the improvements introduced by the methods proposed in this 
paper, VI version implements the BGA approach where the initial population has been 
created following the projection algorithm presented in section 3.2. In addition to this 
improvement, V2 version not only starts with this new initial population but also it 
implements the crossover mechanism described in section 3.1. 

These tables show that both VI and V2 approaches obtain similar results than LOT 
and BGA methods for all considered distribution of avatars in SMALL worlds. How- 
ever, as it is described in [9] and [15], the main purpose of a partitioning method is to 
improve the scalability of DVE systems. Therefore, it must provide a significant per- 
formance improvement when it is used in LARGE DVE systems. The results obtained 
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for the LARGE world show that the quality of the provided partitions are increased in 
terms of C p values. Both VI and V2 approaches require similar execution times than 
BGA method. However, V2 method is able to decrease C p function from 1832.2s. to 
321.3s when uniform distributions of avatars are considered. In the case of skewed and 
clustered distributions, C p values are decreased in a similar proportion by both methods 
VI and V2. These results show that by simply improving the structural diversity of the 
initial population (V 1 version) it is possible to evolve the population of chromosomes 
until more efficient solutions. 



5 Conclusions and Future Work 

Current Distributed Virtual Environments (DVE) are usually designed following server- 
network architectures. In these architectures, a NP-hard problem called the partitioning 
problem has become a critical issue in order to design efficient and scalable DVE sys- 
tems. 

In this paper, we have analyzed and improved a recent method based on a genetic 
algorithm designed for solving this problem. This method currently provides the best 
results for the partitioning problem in DVE systems. One of the proposed improvements 
for this genetic method consists of using a new stochastic algorithm for the generation 
of the initial population that maximizes the structural diversity of chromosomes. On 
other hand, we have proposed the replacement of the traditional single-point crossover 
operator by a pool of five different crossover mechanisms oriented to problem specifi- 
cation. 

Performance evaluation results show that the proposed implementation of the ge- 
netic method provides better solutions to the partitioning problem than the current ap- 
proaches to the problem. Therefore, the proposed approach can improve the efficiency 
and scalability of DVE systems. 

As future work to be done, we plan to design a parallel implementation of GRASP 
approach. This new design will be based on a master-slave configuration and will be 
implemented in conjunction with a post-optimization procedure. 
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Abstract. We describe in this paper an ITS called SIAL that supports 
the learning of problem solving skills in computational logic from obtain- 
ing the clause form of simple well formed formulae to hyperresolution. 
The core function in SIAL is the error diagnosis module, that has the 
role of detecting and interpreting the mistakes of the learner while he/she 
is solving the exercises. It combines both model-based and knowledge- 
based (expertise) diagnosis in order to achieve more accurate results. 
SIAL complements this core function with a flexible user interface and 
a pedagogical module that offers three modes of interaction adapted to 
the learner’s level of expertise. SIAL is currently being tested by a group 
of volunteers in order to measure and tune its accuracy, as a preliminary 
step before performing tests in real conditions. 

Keywords: Model-based Diagnosis, Intelligent Tutoring Systems, Com- 
putational Logic, Knowledge-based Systems 



1 Introduction 

New techniques for improving teaching quality, promoting students’ motivation 
and more individualized learning are being demanded by modern universities. 
Using educational software may be seen as a valuable tool for fulfilling these 
demands. Intelligent Tutoring Systems (ITS) are computer programs that can 
be used in order to achieve that goal, thus we have been working on developing an 
ITS for reinforcing some topics in Artificial Intelligence (AI) subjects, specifically, 
automatic theorem proving (automated reasoning) related topics, as one of the 
most important milestones in AI (even in Computer Science) development and 
hence a main topic in basic AI courses and literature. 

Although we devote great care and attention to explaining automated the- 
orem proving basis to students, it was detected as a difficult topic for them to 
master. We observed the usefulness of computer aided instruction when teach- 
ing this AI topic in undergraduate courses. Standard programs as SWI-Prolog 
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and others developed for the purpose such as SLI (Logic Inferential System), 
a theorem prover previously developed by our research group, were integrated 
in practical sessions, and a significant improvement in student performance was 
attained [1], Despite that the SLI main feature is displaying a graphical represen- 
tation of the refutation process, the main benefits observed with this experience 
were a higher student participation in the learning process, which increased 
their motivation and interest in the subject, and a better understanding of the 
concepts displayed. This experience led us to develop a new program: SIAL (In- 
telligent System for the Learning of Logic), more focused on pedagogical issues 
[ 1 , 2 ]- 

Computational logic (first order logic), not only encompasses student’s ex- 
ercises covering the production of the clause form, predicate unification, ap- 
plication of resolution rule, etc., but also those problems related to the semi- 
decidability nature of this kind of logic [3]. The last area implies computational 
indecidability, that borders the limit of known mathematics, which makes the 
problem a very interesting challenge for developing an intelligent tutor. 

The next sections are organized as follows: the SIAL environment is pre- 
sented, then the system architecture is sketched, focusing on the diagnosis sub- 
system. Section 4 shows an example of the use of SIAL, and the paper finishes 
by presenting the main conclusions. 

2 A Tutoring System for Computational Logic: SIAL 

SIAL is an ITS designed to perform as a practical tool that automatically checks 
the user’s solution to a proposed exercise. SIAL behaves like a assistant to 
diagnosis rather than an expert instructor, providing computational support 
in order to help the user to fix some concepts and problem solving procedures. 
It operates by proposing a set of exercises to the user (learner) which have to be 
solved using well known techniques and methods in computational logic. 

The main objectives pursued with SIAL are: 

— Improvement of learning by means of a software application that facilitates 

the assimilation of abstract concepts and problem solving skills. 

— Availability of a laboratory application to monitor the student learning. 

In the next subsections we describe SIAL, including the topics of computa- 
tional logic covered by the system, the modes of interaction, the system archi- 
tecture and the user interface. 

2.1 SIAL Pedagogical Capabilities 

The tutor is organized in thematic levels (Table 1), ranging from the simplest 
problem solving skills to the most complex ones [4] . Levels 1-6 include converting 
well-formed formulae (wff) to clause form, predicate unification, binary resolu- 
tion, resolution refutation and factoring rule. These first six levels constitute the 
basic skills to be acquired, thus forming the basis for the following levels. 

Levels 7 to 12 deal with methods for selecting clauses to yield a new resolvent 
and reducing the number of generated clauses. Pure literals, tautologies and 
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Table 1 . Description of each level defined in SIAL, guiding and topic classification. 



Level Guiding Topic Description 



1 


Strong 


Single wff to clause form (without Skolemization process). 


2 


Strong 


Single wff to clause form (with Skolemization process). 


3 


Strong 


Single Unification procedure. 


4 


Strong 


Compound Resolution rule. 


5 


Strong 


Compound Strongly guided refutation resolution. 


6 


Strong 


Compound Factoring rule. 


7 


Weak 


Compound Weakly guided refutation resolution. 


8 


Weak 


Compound Pure literal removing. 


9 


Weak 


Compound Tautology removing. 


10 


Weak 


Compound Support set strategy. 


11 


Weak 


Compound Subsumption. 


12 


Weak 


Compound Hyperresolution (positive/negative hyperresolution). 



subsumed clauses should be detected and removed. Also, set support strategy is 
included as a way to control the progressive increasing in the number of clauses. 
Level 12 introduces the hyperresolution rule. 

SIAL levels can be grouped depending on the type of interaction the user is 
allowed to carry out. As the type of interaction influences the learning process, 
three kinds of interaction are considered: the most basic levels (1-6) perform 
strongly guided interaction, that encourages the user towards stepwise interac- 
tion. It is thought for low level learners that are expected to need a strong 
support from the system. The next levels (7-12) implement weakly guided in- 
teraction, where some intermediate steps can be avoided, going straight to the 
refutation stage. This is thought to be appropriate for middle-level users. Finally, 
the system implements an automatic model that performs as a usual theorem 
prover, on which the user can propose his own problem and ask the system to 
solve it. 

Other grouping is possible. Levels 1-3 are devoted to only one topic at a 
time (producing the clause form/unification), so that the user can concentrate 
on an isolated topic. Levels 4-12 are defined to gather several topics. Each level 
above 4 can assume the functionality of each level below it, so that the user must 
integrate the knowledge already practiced jointly with a new technique, in the 
current problem solving process. 

The idea of knowledge scaffolding [5] underlies this approach: the system 
selects lower level exercises for non-expert users and encourages the user to 
solve them in a stepwise fashion, whereas a higher level user must relate several 
different concepts and methods previously shown in order to obtain the solution, 
and is allowed a more flexible interaction with the system. 

2.2 The System Architecture 

SIAL is composed by the following four main modules (Figure 1): 

1 Interface Module. This interacts with the user. Besides the user interface, 

it also contains the user manager and the problem selector modules, which 
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Help 



Fig. 1. SIAL system architecture. 



manage the access to the user accounts and retrieve the following exercise to 
be solved. This module is also tightly related to the help manager and the 
DB interface. 

2 Lexical-Syntactic Module. This is designated to control the user input. 

3 Diagnostic Module. It is based on the tool called SLI (see Section 1) as 

the domain model. The system compares the SLI output with the user’s 
answer. This approach is taken in each stepwise level and also in most of 
the non-stepwise levels. This module includes a step-by-step theorem prover 
(SLI), an expert system engine (CLIPS) and a Strategic manager. Also, this 
module can make use of the next one, in order to verify some of the user’s 
answers. 

4 Automatic Resolution Module. It makes use of OTTER, a first order 

logic theorem prover. This module is used in automatic resolution and when- 
ever an internal complete refutation has to be obtained. 

As the user progresses, the system selects exercises increasing slightly the level 
of complexity. The set of exercises and its associated level can be modified by 
the instructor, since this information is included in the database. 

2.3 User Interface 

The SIAL interface is designed to allow the user to interact in two ways: graph- 
ical interaction using the mouse and some dialogs, and free interaction through 
textual input. The graphical interaction is mainly used in putting wff into clause 
form (strongly guided mode), whereas dialogs are used in specific steps (unifica- 
tion, resolution, factorization, clause selection, . . . ). Advanced users are allowed 
a textual (free) interaction. Other interface features are: 

Direct Manipulation. In order to avoid the unintentional introduction of new 
symbols, and to detect in each step the part of the expression being manipu- 
lated, the interaction allowed in the strongly guided mode relies on a direct 
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Fig. 2. A window of SIAL: converting to clause form window (strongly guided mode). 



manipulation basis. A specific button bar has been developed for the process 
of putting the expressions into clause form. It has a button for each logical 
operator (A , — >, <->, ...). The user introduces one of them in an expression 
by selecting it in the button bar and dragging it into the part of the ex- 
pression where he wants the connector to be placed. The button bar has 
additional buttons for variable renaming, skolemization, expression removal 
and introduction/removal of carriage return. 

Intelligent Interface. To facilitate the selection of (sub)expressions, an intel- 
ligent interface has been developed, which selects the whole subexpression 
affected by any symbol as the mouse passes over it. This feature is not only 
good for the ease of use, but also it fulfills a pedagogical goal: it helps the 
user to think always in the subexpression level, never on single characters. 

Usability. One important objective in SIAL is to offer a functional and flexible 
interface. This led us to include several alternatives for manipulating expres- 
sions. A color code was established in order to improve the understanding 
of the operations being made. For instance, Figure 2 shows the dragging 
of a subexpression, which is displayed beside the cursor in red, indicating 
that the subexpression cannot be dropped in this part of the window to be 
manipulated. Also, unbalanced brackets are shown in red whereas balanced 
ones are in blue. Output is also graphical whenever it is possible. Within 
Levels 5-12 a refutation is the goal pursued so a graphical refutation tree is 
constructed. This output has also a color code: the last generated clause is 
displayed in red, unifiers in blue and duplicated clauses in green. 



3 The Error Diagnosis Subsystem 

Error diagnosis is the task of inferring the learner’s knowledge by analyzing the 
user’s behavior. It is a necessary task to support individual-adapted instruction. 
Many techniques have been proposed to develop a student model [6]. Model 
tracing is perhaps one of the most successful and some approaches using this 
technique have been proposed. A severe drawback of model-tracing tutors is the 
high cost of the development of an expert domain model. 
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In our case, error diagnosis is strongly influenced by the level of interaction 
allowed between the system and the student. Within the strongly guided levels, 
the student is encouraged to perform every step necessary to yield a solution. 
The input is usually entered by direct manipulation using the graphical interface. 
This interface is designed to act as a user input supervisor, restricting the number 
of possible errors. When an error is detected, the system interprets it using direct 
inference, matching the error with the student’s mistake. 

As the student gains familiarity with the basic skills, a more flexible interface 
is provided. Within the weakly guided levels, the general strategy changes and 
processing is divided into two main steps: ( i ) exercise statement and (ii) some 
new technique to solve the stated exercise. In the exercise statement step, the 
user is presented a set of first order logic sentences so that the corresponding set 
of final clauses must be provided. Due to the fact that the user only provides 
the final statement, the direct inference diagnosis method is almost impossible 
or unreliable at this step. In order to overcome this problem, the system follows 
a different method. It yields the set of clauses that cannot be matched to any 
one obtained internally as a diagnosis. This kind of weak diagnosis still provides 
the user with useful information about the quality of the answer, and lets the 
user focus the search of his/her mistake (s). 

One of the main goals of the design of SIAL was to achieve a flexible inter- 
action with the user, but the more freedom the user is allowed the more difficult 
to yield a diagnosis is, so up to two paradigms can be applied to obtain the 
diagnosis [7]: model-based diagnosis to ensure the user’s answer validity and, 
sometimes whenever it is not possible to get an accurate diagnosis, an expertise- 
based diagnosis is tried. The next two subsections explain these two approaches. 

3.1 Model-Based Diagnosis 

Model-based diagnosis is a powerful diagnosing tool whenever a model is avail- 
able. The model-based diagnosis uses the model to generate the correct output, 
the discrepancies between the model and the user’s answer, and to decide what 
might be causing the errors [8] . 

In the development of SIAL a practical approach has been taken. A theo- 
rem prover (SLI), capable of producing the clause form from first order logic 
expressions and performing resolution refutation (for non-Horn clauses), is used 
as the domain model and supports the intelligent tutor. This also let us allow the 
user a very free interaction with the program as it can obtain the next correct 
solution from the last user’s right answer. So there is no solution coded within 
the exercise statement since SIAL can obtain the correct solution on its own. 
Furthermore, the user can present his/her answer in a slightly different format 
from the one obtained internally, and the system is able to match both answers 
in order to compare them [9]. 

SIAL will accept (and assume) the user’s answer if both expressions, the 
user’s answer and SIAL internal solution, are equal except for alphabetic varia- 
tions and for Skolem function and constant names in which also an alphabetic 
variation is allowed ( comparison criterion). This approach is taken through al- 
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Original expression Internal solution User’s answer Result 

(VX)(p(X) -> q( X)) -ip(A') V q(X) -.p(E) V q(W) Rejected 

(\/X)(3Y)(p(X) -+g(X,Y)) -n p(X)Vq(X,f_sko3(X ))) ^p(Z) V q(Z, g(Z)) Accepted 



Fig. 3. Two examples of answers to be compared and the comparison result. 



most every check the system carries out, which means a more flexible interaction 
between the user and the tutor. The user will be able to choose the variable 
names, the skolem-function names, the skolem-function arguments, the predi- 
cates order, etc. Thus, SIAL is able to accept small solution variations generated 
by the user. Figure 3 shows two examples of use of the comparison criterion. 

Now, the diagnosis approach is easy. If the user’s answer cannot be accepted 
an error is detected. Almost every error detected in unification and resolution 
processes will rely on a subset of substitutions that does not comply with the 
comparison criterion established, so this subset can be used to explain why the 
user’s answer is rejected. This approach can also be taken in every other pro- 
cess based on those mentioned. For instance, factoring and subsumption greatly 
rely on unification, in addition to the checking on the suitability of the literals 
chosen, so that an error detected at the unification process will be an error at 
the corresponding factoring/subsumption process. Also, the hyperresolution is 
based on resolution and unification processes; the detection carried out on those 
processes will be the most important part of the diagnosis process, jointly with 
the checking on the adequacy of the clauses selected. 

Obtaining the clause form requires a special treatment. Even the strongly 
guided mode requires an additional processing as the most of the time it deals 
with logical expressions without any restriction. In this case, the comparison 
criterion only lets SIAL accept/reject the user’s answer but it does not provide 
information enough in order to obtain an accurate diagnosis. It is even worst 
if the user carries out several simultaneous steps. To deal with this eventuality, 
both answers are converted into clause form and a one-to-one clause matching 
(using the comparison criterion) is tried. If such a match can be obtained then 
the user’s answer is accepted or otherwise rejected. As a diagnosis is very difficult 
to obtain in this case, the method exposed in the next section is performed in 
order to provide the user with useful information about the error found. 



3.2 Knowledge-Based (Expertise) Diagnosis 

This kind of diagnosis is invoked only when a user’s error is detected while clause 
form is being produced. To be able to yield a diagnosis, three basic suppositions 
have been adopted: ( i ) a single connector has been modified, (ii) the user has 
only made one mistake (the usual single-fault assumption), and (in) the user is 
in strongly guided mode. Then, an error catalog is checked in order to locate the 
mistake made. This error catalog is implemented as an expert system, and the 
CLIPS engine is invoked to match the error. Some mistakes and their associated 
explanations are shown in Table 2. Each mistake is converted into a rule. The 
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Table 2. Pieces of the expert system. Some mistakes and the associate explanation. 



Original User 

expression expression Explanation provided 



expri <-> expr r ( expri — > expr r ) 

V ( expr r — > expri) 
expri — > expr r expri V -> expr r 

-i (expri A expr r ) -'(expri V expr r ) 

-i( expri A expr r ) -^expri A -> expry 



“A/V mistake at connector substitution” 

> / V -i. Incorrect substitution. Negation has 
been placed on the wrong side” 

“-i A / — iV. Incorrect substitution. The change of 
connector cannot be made” 

“-i A /-i A -i. Incorrect substitution. The main 
connector has not been changed” 




(a) Unification dialog. 




(b) Error in unification. 



Fig. 4. Strongly guided unification dialog. 



set of rules can be viewed as a cause-effect network where the expert system 
performs a simple set covering strategy very close to the one proposed in [10]. 
Whenever it is not possible to yield a diagnosis, a default message is sent to the 
user asking him to carry out only one modification in the expression at a time. 

4 A Running Example 

SIAL has been implemented as a Master Thesis in the Comp. Sci. Dept, at 
the University of Valladolid (Spain) . Nowadays the software application is being 
tested by a group of volunteers in order to check the accuracy of the diagnosis 
provided by the system, as a previous step to its testing in a real classroom 
situation. This section is mainly devoted to show some examples of the usual 
interaction between the student and SIAL using some input and output screens. 

Figure 4(a) shows the strongly guided unification dialog. This dialog allows 
the user to unify two literals making up the sequence of bindings from the set 
of terms just below its corresponding literal. Each proposed binding is checked 
and applied to both literals if it is valid. Figure 4(b) shows an unification error. 
The user is informed that the proposed substitution g(h(Y),c)/Y is not valid 
because Y £ g(h(Y), c). The Help button shown in Figure 4(b) leads to a textual 
explanation of the corresponding misunderstanding topic. 
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(a) A 1st error message in factorization. 



(b) A 2nd error message. 



Fig. 5. Strongly guided factorization dialog. 
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(a) Nucleus selection. 



(b) Graphical hyperresolution dialog. 



Fig. 6. Hyperresolution process. 



Figure 5(a) shows the factorization dialog. An error message has been gen- 
erated. It informs the user that the selected literals p(X , f(X),g(h(Y), c), Z) and 
r(Z) cannot be factorized together. A second mistake selecting the literals will 
produce a different message, the one shown in Figure 5(b), where the user is 
provided with a more detailed message: Unifiers cannot be obtained because of 
literals cannot be unified or are already equals. 

Figure 6 shows part of the hyperresolution process implemented in SIAL. 
The dialog shown in Figure 6(a) lets the user select the nucleus clause from a set 
of clauses previously chosen, every other clause in the set (the satellites) must 
clash with the nucleus. Figure 6(b) shows the graphical hyperresolution yielded. 

5 Conclusions 

In this work, SIAL, an ITS for learning computational logic, has been presented, 
paying special attention to the error diagnosis subsystem, as it is one of the most 
important modules. This module makes use of an automatic theorem prover and 
an expert system for yielding a diagnosis. The combination of the model-based 
and expertise approaches allows us to provide the student with an accurate 
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assistant. SIAL is able to interact with the user in different ways, depending 
on the skills already acquired. This feature influences the diagnosis provided by 
SIAL, since beginners will obtain a more precise error message than advanced 
users. The model-based approach chosen lets SIAL detect most of the user’s 
errors accurately, inform the user about the error, and provide some hint to 
localize and fix the mistake. 
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Abstract. This paper proposes to automate the generation of shellfish exploita- 
tion plans, which are elaborated by Galician extracting entities. For achieving 
this objective a CBR-BDI agent will be used. This agent will adapt the exploita- 
tion plans to the environmental characteristics of each school of shellfish. This 
kind of agents develops its activity into changing and dynamic environments, so 
the reasoning model that they include must be emphasised. The agent reasoning 
model is guided by the phases of the CBR life cycle, using different technolo- 
gies for each phase. The use of an adaptative neuro-fuzzy inference system in 
the reuse phase must be highlighted. 



1 Introduction 

There are different types of agents and they can be classified in different ways [20]. 
One of these types are the so-called deliberative agents with a BDI architecture, which 
are characterized for having mental attitudes of Beliefs, Desires and Intentions; be- 
sides they have capacity to decide what to do and how to get their objectives accord- 
ing to their attitudes [20] [9] [14] [2]. 

Formalisation and implementation of BDI agents constitutes the field of research of 
many scientists [9] [14] [8] [16]. Some of them criticise the necessity of studying 
multi-modal logic for the formalisation and construction of such agents, because they 
have not been completely axiomatised and they are not computationally efficient. Rao 
and Georgeff [13] state that the problem lies in the wide distance between the power- 
ful logic for BDI systems and practical systems. Another problem is that this type of 
agents is not able to learn, a necessary attitude for them since they must be constantly 
adding, modifying or eliminating beliefs, desires and intentions. Therefore it would be 
convenient to include a reasoning mechanism which involves a final apprenticeship. 

The developed job shows how to build deliberative agents, using a case-based rea- 
soning system (CBR), that solves the problems quoted previously. In the reasoning 
process of these agents, a GCS network ( Growing Cell Structures ) and an ANFIS 
model ( Adaptative Neuro-Fuzzy Inference Systems ) are utilized. The GCS network is 
used in the retrieve phase whereas the ANFIS model implements the phase of adapta- 
tion. 
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This paper is structured as follows. In section 2 the reasoning model of CBR-BDI 
agents is detailed. Section 3 proposes to automate the generation of shellfish exploita- 
tion plans, that are elaborated by Galician extracting entities; the results are also ana- 
lysed in this section. Finally, in section 4 some conclusions are exposed. 



2 Reasoning Model of CBR-BDI Agents 

The relationship between CBR systems and BDI agents can be established implement- 
ing a case as a set of beliefs, together with an intention and a desire which caused the 
resolution of the problem. Using this relationship agents can be implemented (concep- 
tual level) using CBR systems (implementation level). Then we are mapping agents 
into CBR systems. The advantage of this approach is that a problem can be easily 
conceptualised in terms of agents and then implemented in the form of a CBR system 
[3] [4] [5] [6]. Once the beliefs, desires and intentions of an agent are identified, the 
reasoning model can be established, in the way presented in this section. 

The reasoning cycle of a typical CBR system includes four steps that are cyclically 
carried out in a sequenced way: retrieve, reuse, revise, and retain [1], In the cases 
base, all experiences which can be used by a CBR-BDI agent are stored. Therefore, 
the first action which must be done is to find groups of similar cases, considering the 
values taken for the different variables. 

In order to obtain such groups a GCS net is used. This kind of net is also used by 
other authors in the CBR retrieve phase[7]. The information provided by the net in- 
cludes: a) how many groups are created, b) which cases take part in each group, c) 
which is the prototype case representing all the cases in the group and d) what is the 
distance between each case within the group and the prototype case. 

For each identified set, a TSK rule is obtained [17]. These rules all together consti- 
tute the initial fuzzy inference system. The antecedent of each rule is a combination of 
variables which describe each case initial belief. They can be represented by a gauss 
function. In order to obtain the rule consequents, the least square method is used [11], 

This initial fuzzy inference system will be used as previous knowledge in the AN- 
FIS model, which will adjust the parameters of both antecedents and consequents, 
using the hybrid learning method explained in section 2.2. The refinement of these 
parameters is done using as input patterns the most similar cases retrieved in the pre- 
vious phase. The result is a new fuzzy inference system, which will estimate the reso- 
lution of a new problem. 



2.1 Retrieve Phase: GCS Network 

The type of GCS used in this work is characterized by a two-dimensional space, 
where the units (cells) are connected and organised in triangles. Each cell in the net- 
work is associated with a weight vector, w, which has the same dimension as the input 
data. At the beginning of the learning process, the weight vector of each cell is initial- 
ised with random values. The basic learning process in a GCS network consists of 
topology modification and weight vector adaptations [10]. This vector is the prototype 
case of each cell of the network. 
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For each training case, the network performs a so-called learning cycle, which may 
result in topology modification and weight vector adaptation. In the first step of each 
learning cycle, the cell c, with the smallest distance between its weight vector, w c , and 
the actual input vector, x, is chosen as the winner cell or best-match cell (see equa- 
tion (1)). 

c: ||x — w c |< |[x — w^Vie O (1) 

The second step consists of the adaptation of the weight vectors of the winning cell 
and their neighbouring cells; see equations (2) and (3). The terms e c and £ n represent 
the learning rates for the winner and its neighbours respectively. Both learning rates 
are constant during learning, and £ c , £ n e [0, 1]. 

w c {t + \)=w c {t) + £ c {x-w c ) (2) 

w n {t + 1) = W n ( t ) + £ n {x - w „ ); Vn e N c (3) 



In the third step of a learning cycle, each cell is assigned a signal counter, T, that 
reflects how often a cell has been chosen as winner (see equations (4) and (5)). 

T c {t + \) = T c {t)+\ (4) 



Vj (t + 1) = Tj (t) - aTj ( t ); i =£ c 



(5) 



The parameter a reflects a constant rate of counter reduction for the rest of the 
cells at the current learning cycle. Growing cell structures also modify the overall 
network structure by inserting new cells into those regions that represent large por- 
tions of the input data. The frequency of insertion update is controlled by the parame- 
ter X, which is associated with the number of learning cycles between two cell inser- 
tions (see equations (6), (7) and (8)). 



hi = 



, T J 



;\/i,j<£ O 



q.h > h { ;\/i e O 



( 6 ) 

(7) 



r : |w,. - w q | > \w p - w q |; Vp e N q (8) 

The GCS network indicates the prototype case of each node, its topology and cal- 
culates the scale parameters cr of each node [7]. This parameter measures the width in 
the gauss membership function. It can be seen in the Figure 1. Higher values of c> 
provide an area more extended of the node dominates in the environment of the cen- 
troide. 

To calculate the node j, the prototype cases of its neighbour nodes are selected; 
then the average of the square-distance between them is calculated [ 1 8] [ 19], that is: 




where K is particular variable in cases. 
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Input Space 

Fig. 1. The point mean cases in the input environment. 



This information is utilized for making the initial fuzzy inference system, which is 
utilized by the ANFIS model. This fuzzy inference system has a set of TSK rules; 
each node provides a fuzzy rule. The rules have the form: 

Ri : if x r is A XJ and „r 2 is A 2J and ... and x M is A Mj , then y = g y (x, ,x 2 ,...,x M ) (10) 



where g(-) is a polynomic function in x. 

Each attribute is represented by a gauss function (equation (11)), which takes part 
of the antecedent of the rule. 



f 

A u (*. ) = exp 

V 





J 



( 11 ) 



where c is a prototype case and o the distance. 

The next step consists of obtaining the consequents of each TSK rule. The method 
utilized is least-square [ 1 1 ] . This initial fuzzy inference system is adapted with the 
ANFIS model. 

In this phase the most similar cases to the new problem are retrieved. The problem 
is determined by a set of variables with particular values, which are used as inputs of 
the GCS net. Next, the searching for the node to which the new problem belongs is 
started, that is, the winner node must be found. This node is obtained by calculating 
the Euclidean distance between the new problem and every prototype case of each 
case. The node with the fewer distance will be the winner. All the cases associated to 
the winner node will be considered the most similar ones, and will be utilised in the 
following phase. 



2.2 Reuse Phase: ANFIS Model 

One of the first hybrid neuro-fuzzy systems for function approximation was Jang’s 
ANFIS model [11], ANFIS adjusts only the membership functions of the antecedent 
and the consequent parameters. 

Because ANFIS uses only differentiable functions, it is easy to apply standard 
learning procedures from neural network theory. For ANFIS a mixture of backpropa- 
gation (gradient descent) and least squares estimation (LSE) is used. Backpropagation 
is used to learn the antecedent parameters, i.e. the membership functions, and LSE is 
used to determine the coefficients of the linear combinations in the rule’s consequents. 
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A step in the learning procedure has two parts, which are shown in Table 1. In the 
first part the input patterns are propagated, and the optimal consequent parameters are 
estimated by an iterative least mean squares procedure, while the antecedent parame- 
ters are assumed to be fixed for the current cycle through the training set. In the sec- 
ond part the patterns are propagated again, and in this, epoch backpropagation is used 
to modify the antecedent parameters, while the consequent parameters remain fixed. 



Table 1 . Two passes in the hybrid learning procedure for ANFIS. 





Forward pass 


Backward pass 


Antecedent parameters 


Fixed 


Gradient descent 


Consequent parameters 


Least-square estimator 


Fixed 


Signals 


Node outputs 


Error signals 



In this phase, the cases retrieved in the previous one are used, that is, the inference 
system provided by the GCS net. While the result is a fuzzy inference system adapted 
for solving a particular problem. With this system a result, which will be converted in 
the desire the CBR-BDI must achieve, is obtained. 

Therefore, the next step the CBR-BDI agent must accomplish will be the planing of 
what actions to do to achieve this desire. The actions carried out for in the retrieved 
cases are obtained and the following process is done. An acyclical and directed struc- 
ture whose first vertex is the new problem and the last one is the desire to achieve is 
created. The construction of this structure is done taking each one of the actions made 
in the retrieved cases and adapting their parameter values. Once the structure is built, 
Dijkstra algorithm [15] is used to determine the shortest path, taking as origin the new 
problem. The path determines the actions which must be done and a new intention is 
built. This intention reflects the solution to the posed problem. 

Summarising, in this phase a sequence of actions starting from a new problem and 
the result which must be achieved, is proposed; that is, a new case. 



2.3 Revision and Retain Phases 

In this phase the solution obtained in the previous phase is evaluated. The revision can 
be carried out using Expert’s Knowledge (rules) or simulation techniques [5], fuzzy 
inference system [7] or Belief-Revision techniques [12], 

The new case (a problem, a solution and a result) is stored in the cases base. In this 
phase the produced error between estimated and real result is calculated. If the error is 
higher than a limit P the GCS network is rebuilt, because this means that a new input 
space which has not been considered before is being visited. Therefore the network 
must modify its topology and adapt the weights vector, even using the new stored 
cases. 



3 Study Case: Shellfish Exploitation Plans Automation 

In Galicia, there are a deep interest in ordering the fishing sector. Due to that, a set of 
rules for organising the marine resources exploitation were developed. According to 
those rules, the galician government is in charge of controlling and regulating both the 
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extracting activity of marine resources in Galicia seashore and their commercial trans- 
actions which take place in the specific locations devoted to that aim. 

In order to practice the extraction of marine resources it is necessary to present to 
the administration some documents named shellfish exploitation plans, which are 
elaborated for entities interested in exploiting Galician marine resources. Each extract- 
ing entity must prepare an exploitation plan for every resource it wants to obtain. The 
aim of these plans is to achieve the greatest continuous economical profit from marine 
resources by means of an appropriate planning for the extracting activity. 

In Table 2, sections and topics included in an exploitation plan are shown. 



Table 2. Exploitation plan summary. 



General data 


Goals 


Evaluation 


Pursuit 


Shellfish men's number 


Production 


Methods 


Daily effort 


Boats number 
Exploitation areas 


Economical 


Conclusions 


Daily production 



Extraction plan 


Improvement actions 


Financial plan 


Probable schedule 


Description 


Incomes 


Dates 


Costs 


Expenses 


Limit (Kg/day) 




Investment 


Fishing traps 
Points of control 
Selling ways 
Surveillance 




Capitalization 



The shellfish exploitation plan presented by a extracting entity has values for a 
complete year and must be approved by the Fishing Authority. Since the moment of 
its approbation, the shellfish exploitation plan will regulate along the year the capture 
of the resources that it contains. 

Notice that plans elaborated by the extracting entities incorporate general data, like 
shellfish men’s number, selling ways, etc, but they also must include forecast about 
productive and economical goals, which have to be based on characteristics of extract- 
ing entities, environment and conditions of shellfish ecosystems. 

Since the information managed by these plans is extensive enough, an automated 
system that gets, stores and analyses data about marine resources becomes essential. 
Appropriate tools for collecting data allow the acquisition of useful knowledge for 
managing marine resources by means of rational criteria. These tools provide help not 
only to entities in charge of exploitation, assisting them in the process of elaborating 
new plans, but also to administration, designing better fisheries policies that prevent 
overexploitation generated by an excessive fishing effort. 

Nevertheless, the interest of the current shellfish management system goes beyond 
a simple statistical study, and pretends to adapt the exploitation plans to the character- 
istics of each school of shellfish and the necessities of Shellfish men. In this way, 
each year different management models can be applied and different technical solu- 
tions can be proposed in order to help in decisions making. 

To obtain this objective, a CBR-BDI agent, like the one described in this work will 
be used. This agent will generate automatically the shellfish exploitation plan, allow- 
ing a reasonable and sustainable exploitation which is a desirable objective in all 
shellfish sector. 
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Next, an example of the application of a CBR-BDI agent for automatically generate 
shellfish exploitation plans for all Galician extracting entities devoted to clam is pre- 
sented. All the information available in the shellfish exploitation plans belonging this 
resource and pertaining to previous years is used. This will allow to adjust the produc- 
tive and economical goals. All the same, as a way of simplifying the example, only an 
estimation of productive goals (resource kilograms) will be done. If the rest of data 
were to be obtained, the same process must be followed. 

In first place, it is necessary to define the CBR-BDI agent in the terms of the 4- 
tupla {E,CB,GAL,EK}, where E are the variables which describe the environment, 
CB is the case base (in terms of beliefs, desires and intentions), GAL is the general 
actions library and EK identifies the expert’s knowledge [4]. 

The variables which describe the example environment are the typical ones for ex- 
tracting entities (number of working days, number of shellfish men x number of days), 
for schools of shellfish like: environmental data (temperature, salty degree in water, 
PH, oxygen rate, transmittance and fluorescence), the size of the different school of 
shellfish (area and perimeter) and limits of allowed captures, besides the variable that 
must be forecast in the exploitation plan. This study is centred in the estimation of the 
number of marine resource kilograms that are to be recollected. 

For building up the CBR-BDI system, a tool, called GABDI (abbreviation in Span- 
ish of BDI agents Generator ), is available. This tool facilitates the addition of infor- 
mation to the cases base. 

In order to define the variables which define the environment, GABDI tool pro- 
vides a form where the name and the rank of values the variable may take, can be 
introduced. Next step is to describe the actions which can be made over environment. 
Table 3 shows the actions which can be done to improve the recollection of a particu- 
lar marine resource. 



Table 3. Actions which can be done to improve the production of a marine resource. 



Action nave 


Input parameters 


Output parameters 


Transport 


Oxygen 


Oxygen, Kilograms 


Move 


Kilograms 


Kilograms 


Plantation 


Kilograms 


Kilograms 


Seaweed removal 


Oxygen 


Oxygen, Kilograms 


Plough 


Area, Perimeter 


Area, Perimeter, Kilograms 



Next, the fulfilled intentions must be defined; that is, the actions done previously to 
solve old problems. Once identified the environmental variables, the actions which 
can be done and the actions done in the past, it is necessary to create the cases base. In 
order to achieve this objective, the states (beliefs) and the plan of done actions (inten- 
tions) must be indicated. 

Now, in the cases base is stored all information available for the agent, so it can 
apply its reasoning model using the techniques described in section 2. The information 
managed by the CBR-BDI agent is saved in csv format. 



3.1 Results Obtained 

The correct functioning of the model has been proved experimentally, by means of a 
set of proofs. First proofs were made over 6554 stored cases, particularly, the ones 
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representing the captures of the years 2000 and 2001 of clam. The forecasting was 
done over 192 situations belonging to year 2002. 

The results corresponding to two systems were compared. One system is the initial 
fuzzy inference system obtained by GCS network (from here on GCS) and the other, 
the system proposed by the present work (from here on CBR-BDI agent). It must be 
highlighted that 82,8% of the times, the forecast of the CBR-BDI agent is better than 
the one provided by GCS system. 

In first place, it has been proved if the samples that must be analysed follow a nor- 
mal distribution. The proofs of normality applied were Z skewness and Z kurtosis. 
Since the result from these tests was negative, that is, they do not obey the normality 
hypothesis, a set of non parametrical proofs were applied. These proofs try to deter- 
mine if a system is better than other analysing the data globally. The statistical tech- 
niques used in this latest case were the Sign Test and the rank sum of Wilcoxon for 
coincident pairs. 

The Sign Test is designed specifically for proving hypothesis referents to median 
of a continuous population. Like the mean, the median is a measure of the centre or 
distribution position, because of that the Sign Test is also known as proof of position. 

Since in the retrieve phase the GCS system provide a initial fuzzy inference system, 
it can be used to make the forecasting without the necessity of adapting its parameters 
by means of the ANFIS model. In order to prove that better results are obtained if the 
parameters are adapted, the two systems were compared: the provided by GCS and the 
CBR-BDI agent. 

Table 4 shows the results after the application of The Sign Test over the two sys- 
tems previously introduced. Results indicate that the error using the GCS system is 
bigger than the produced by CBR-BDI agent, with a confidence level of about 95%. 
Since p value is very small, it can be assured that in the case of the two extremes proof 
the null hypothesis is also rejected. 



Table 4. The Sign Test between the initial fuzzy inference system and the CBR-BDI agent. 



Test 

Alternative Hypothesis 


Sign Test 

GCS Error >= Agent Error 


Difference between pairs 


N 


Positive 


183 


Negative 


9 


Zero 


0 


Median difference 


3088,466 


95.8% Cl 


2387,762 to +°° (exact) 


Sign Statistics 


183 


1-extremo p 


<0.0001 (exact) 



From the results obtained with the statistical sign test, it can be concluded that the 
CBR-BDI agent provides better results than the other two systems. 

A powerful non parameter technique must compare the whole probability distribu- 
tions not only the median. This test, which is called the rank sum of Wilcoxon, proofs 
the null hypothesis, that is, the probability distributions associated to the two popula- 
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tions are equivalent against the alternative hypothesis the probability distribution of a 
population is moved right (or left) with respect to the other. 

In Table 5, the results of the rank sum of Wilcoxon test for two populations, as well 
as the error produced by GCS system and for the CBR-BDI system, are shown. 



Table 5. Rank sum of Wilcoxon among GCS system and CBR-BDI agent. 



Test 


Rank sum of Wilcoxon test for coincident pairs 


Alternative 

Hypothesis 


GCS Error >= 


\gent Error 




Pairs difference 


n 


Rank sum 


Rank median 


Positive 


183 


18377,0 


100,42 


Negative 


9 


151,0 


16,78 


Zero 


0 






Median difference 


7285,193 






95.0% Cl 


5267,973 


to 


(normal aproximation) 


Wilcoxon's statistic 


18377 






1 -extreme p 


<0.0001 


(normal aproximation) 



Before doing the tests showed in Table 5, it was made the rank sum of Wilcoxon 
test for coincident pairs in the two extremes, in which the null hypothesis was re- 
jected. In order to further refine this proof, it was analysed as alternative hypothesis, 
that the GCS system error were bigger or equal to CBR-BDI agent error, rejecting also 
the null hypothesis. The final conclusion was that population number 1 is moved to 
right of population number 2. 

The rank sum of Wilcoxon test for coincident pairs reinforces the results obtained, 
in the sense that the CBR-BDI system provides better results than the other system. So 
that, it can be concluded that it is necessary to adapt the initial fuzzy inference system 
obtained by GCS network and that the ANFIS model is the best one to be utilised for 
that objective. 



4 Conclusions 

This job is part of the objectives of the action of research “Ampliacion de sistemas de 
informacion geografica orientado a la gestion de los recursos especificos a los demas 
recursos marisqueros de Galicia” approved by Xunta of Galicia with code: 
PGIDTCIMA 02/3. This research is carried out between CIMA ( Centro de 
Investigations Marinas ), University of Coruna and University of Vigo. 

In this paper, it is showed how a CBR-BDI agent is able to learn and to give 
solutions to a particular problem. It utilizes a fuzzy inference system, which is adapted 
to the problem to solve. 

The acceptance of this CBR-BDI agent, by the extracting entities, has been 
excellent. At this moment, it is being tested in different extracting entities. The 
satisfaction exhibited by the entities devoted to extract marine resources allows to 
foresee that the system will be completely implanted next year. 
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At present the elaboration of the shellfish exploitation plans is manual, slow and 
little reliable. In this action of research it is proposed a simple input data, besides the 
automatic preparation of the plans with that data. 

As a final conclusion, it can be said that the model described in this research paper 
is capable of adding the partial knowledge provided by each incorporated technology, 
creating a global knowledge system, based on the case base reasoning method. 
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Abstract. The p-hub problem is a facility location problem that can be viewed 
as a type of airline network design problem. Given a finite set of nodes, each 
node (city) sends and receives some type of traffic (airline passengers) to and 
from other nodes (cities). The hub (airport) locations must be chosen from 
among these nodes to act as switching points. In this paper we consider the un- 
capacitated p-hub median problem with single allocation, where each non-hub 
node (origin and destination) must be allocated to exactly one of the p hubs. We 
provide a reduced size formulation and a competitive recurrent neural model for 
this problem. The architecture of the proposed neural network consists of two 
layers (allocation layer and location layer) of np binary neurons, where n is the 
number of nodes and p is the number of hubs. The effectiveness and efficiency 
of the proposed recurrent neural network under varying problem sizes are ana- 
lyzed. Computational experience with another neural networks and heuristics is 
provided using data given in the literature. 



1 Introduction 

Hub location research has become an important area of location theory over the last 
two decades. This is due in part to the use of hub networks in modern transportation 
systems as network airlines design. These systems attend demand for travels between 
many origins and many destinations, where economies of scale exist in the cost for 
such travels. Rather than serving every origin-destination demand with a direct link, a 
hub network provides service via smaller set of links between origins/destinations and 
hubs, and between pairs of hubs. Such a network allows a large set of origins and 
destinations to be connected with relatively few links, via central hub facilities. The 
use of few links in the network concentrates flows and allows economies of scale to 
be exploited. Hub location problems involve locating hub facilities and designing hub 
network. 

The location of hub facilities is an important issue arising in the design of airline 
passenger flow. Passengers generally have to travel longer distance and a longer time 
because non-stop services is reduced. However, the airline companies usually offer 
more frequent flight services because of fewer operating routes. A good airline net- 
work design is beneficial not only for the airline companies but also for many passen- 
gers; consequently, many airline companies are interested in locating their own hub 
airports. 

Hub location problems may be classified by the way in which the demand points 
are assigned, or allocated, to hubs. One possibility is single allocation, in which each 
demand point is allocated to a single hub, i.e. each demand point can send and receive 
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via only a single hub. A second possibility is multiple allocation, in which a demand 
point may send and receive via more than one hub. 

The problem addressed in this paper models the situation where there are n cities 
(nodes), and p of these cities will be designed as hub airports. Each node in the net- 
work can interact with each other only via the hubs to which they have been allocated, 
and has to be connected to exactly one of the p hubs. More specifically, the problem 
studied in this paper is the uncapacitated, single allocation, p-hub median problem 
(which will be referred to as USApHMP). 

A quadratic integer programming formulation for this problem was proposed by 
O'Kelly [7]. Since O'Kelly original formulation, several researchers have used various 
heuristics to solve this problem. O'Kelly considered the use of two heuristics for solv- 
ing an uncapacitated p-hub median problem which models flights paths of an airline 
company, and attempts to assign each airport to a fixed hub. Thus, the problem was 
reduced from location-allocation problem to an allocation problem alone. Klincewicz 
[6] examined ways of avoiding convergence of such heuristics to sub-optimal local 
minima by using tabu search and GRASP strategies, although the problem being con- 
sidered was still the simplified problem of allocating nodes to a fixed hubs using a 
minimum distance rule. Skorin-Kapov and Skorin-Kapov [9] considered the use of 
tabu search for solving the complete location-allocation problem. Aykin [ 1 1 devised 
various others heuristics, Ernst and Krishnamoorthy [5] applied a simulated annealing 
heuristic and Smith, Krishnamoorthy and Palaniswami [8] considered a modified 
Hopfield network to solve the uncapacitated, single allocation, p-hub problem. 

In this paper we proposed a recurrent neural model for solving this problem that we 
applied usefully to related problem like the p-median problem [3] . We provide a 
reduced size formulation and a competitive recurrent neural model for this problem. 
The architecture of the proposed neural network consists of two layers (allocation 
layer and location layer) of np binary neurons, where n is the number of nodes and p 
is the number of hubs. The process units (neurons) are grouped in assembles, where 
one neuron per assembly is active at the same time and neurons of same assembly are 
updated in parallel. The computational dynamics for the network has been defined 
and its convergence has been proved. Moreover, the energy function (objective func- 
tion) always decreases as the system evolves according to the dynamical rule pro- 
posed. The advantage of the recurrent neural networks over more traditional tech- 
niques lies in their potential for rapid computational power when implemented in 
electronic hardware, and the inherent parallelism of the neural network. Of course, the 
proposed recurrent neural network has been simulated on a digital computer, and is 
therefore subjected to some limitations. Certainly, satisfactory hardware implementa- 
tion is still subject of much research, and many design challenges lie ahead in this 
field. Yet there is little doubt that it is only a matter of time before VLSI implementa- 
tions of large scale neural networks are possible. The effectiveness and efficiency of 
the proposed recurrent neural network under varying problem sizes are analyzed. 
Computational experience with another neural networks and heuristics is provided 
using data given in the literature. 

The paper is organized as follows. In section 2 we review the problem formulations 
and we propose a new reduced size formulation. Section 3 presents the proposed 
competitive recurrent neural model. Section 4 describes the proposed neural network 
algorithm. Illustrative simulations and computational results using the well known 
1970 Civil Aeronautics Board (CAB) data set are compared with others heuristics and 
reported in section 5. Finally, section 6 provides a summary and conclusions. 
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2 Problem Formulation 



The hub location problem can be described as follows: given the location of a set of n 
nodes or cities, the volume of flow ( Wy ) that must be shipped between each origin- 

destination pair and the cost per unit flow ( Cy ) between each origin-destination pair. 

Then, we have to select p nodes from the set of them to be hubs. Hubs are airports or 
switching points for flow and they are fully connected. All flow travels via hubs and 
each non-hub node must be allocated to a unique hub node. The location of the hubs 
or airports and the allocation of the nodes or cities are chosen so that the total cost of 
the system is minimized. It should be noted that all flow that must be shipped between 
cities, have three separate components: collection (origin city to hub airport), transfer 
(hub airport to hub airport) and distribution (hub airport to destination city). 

O’Kelly [7] gave the first formulation of USApHMP as a quadratic integer pro- 
gram. This formulation has n 2 variables, even so this problem is difficult to solve due 
to the non-convexity of the objective function. Subsequently a mixed integer linear 
program with n 4 + n 2 variables was developed by Campbell [2] to obviate the non- 
convexity of the objective function. Ernst and Krishamoorthy [5] developed a re- 
■ -32 

duced mixed integer linear program using n + n , and recently, Ebery [4] presented 
a formulation with 2 n 2 variables. In this paper, we proposed a new reduced formula- 
tion for the USApHMP using 2 np variables. The proposed formulation is defined as 
follow 
Minimize 



n n p n 




i = 1 y=l q = 1 k=l 



frvikCij + ?hiCji 



P 



+ a 2jZj WikC J mXk r y n 



iqs jq 



m = 1 r=l 



Subject to 



p 

2X =1 i = h2,...n 

?= i 



q = h2,...p 



where 

[l if the node i is allocated to cluster q 
,q [0 otherwise 

[ 1 if the node j is the hub of cluster q 
y Jg [0 otherwise 

Wy is the amount of flow from the node i to the node j 

Cy is the transportation cost associated between the nodes i and j 

ae [0,1] is the transfer coefficient 



(1) 

(2) 

( 3 ) 
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ft e [0,l] is the collection coefficient 
ye [0,1] is the distribution coefficient 

In the objective function (1), first and second terms are the cost of assigning a node 
to its hub for outgoing and incoming flows respectively. These terms are multiplied 
by two coefficients respectively: /? (collection coefficient) and y (distribution coef- 
ficient). The third component counts the costs of those interactions, which must flow 
between hubs. These inter-hub costs are multiplied by a parameter a to reflect the 
scale effects in interfacility flows. Constraint (2) ensures that each node is allocated to 
a unique cluster and restriction (3) ensures that one and only one hub is opened in 
each cluster. Note that this formulation is very simple. 



3 Competitive Recurrent Neural Network Model 



The proposed neural network consists of two layers (allocation layer and location 
layer) of interconnected binary neurons or processing elements. Each neuron i has an 
activation potential h, and an output .S' ; e {0,1} . In order to design a suitable neural 
network for this problem, the key step is to construct an appropriate energy function E 
for which the global minimum is simultaneously a solution of the above formulation. 
The simplest approach to constructing a desired energy function is the penalty func- 
tion method. The basic idea in this approach is to transform the constrained problem 
into an unconstrained one by adding penalty function terms to the objective function 
(1). These terms cause a high cost if any constraint is violated. More precisely, in- 
creasing the objective function by a quantity, which depends on the amount by which 
the constraints are violated, eliminates some or all constraints. That is, the energy 
function of the neural network is given by the Liapunov energy function defined as 



n n p n 



*-IIII 

i=l 7=1 q = 1 k = 1 



" F 

(h^kCy + yWkfCji + a II w ik c j m x kr y„ 



x i q yj q + 



m = 1 r=l 



' p V , ' 



>2 



(4) 






y u 



jq 

i=l (_ 9=1 ) 9=1 k M ) 



where A, > 0 are penalty parameters that they determine the relative weight of the 
constraints. The penalty parameters tuning is an important problem associated with 
this approach. 

In order to guarantee a valid solution and avoid the parameter tuning problem, we 
split our neural network in disjoint groups or assemblies according to the two restric- 
tions, that is, for the p- median problem with n points, we will have n groups or as- 
semblies, according to restriction (2), plus p groups or assemblies, according to re- 
striction (3). Then, we will reorganize our neurons in two matrices (one matrix per 
neuron type) where a group is represented by a row or column of the matrix according 
to neuron type. 
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Fig. 1. Neuron organization for the USApHMP. 



Fig. 1 shows two matrices, the first matrix contains the allocation neurons and the 
second contains the location neurons. The allocation neurons inside same group are in 
the same row of the matrix, and the location neurons inside same group are in the 
same column. 

In this model one and only one neuron per group must have one as its outputs, so 
the penalty terms are eliminated from the objective function. The neurons inside same 
group are updated in parallel. Then we should ought introduce the notion of group 
update. Observe that the groups are updated sequentially. Then, the energy function of 
the neural network is reduced to 




+ a HH W:I 

m = 1 r = 1 



(5) 



We avoid the parameter tuning problem for A l and A 2 in the eq. (4) with the new 
model due to the energy function of the new model (5) do not have penalty terms. 

The activation potential of each neuron of the network are 



K ="ZZ 



fo’ikCij +7H’kiCji +a II W ik C jm X kry„ 



(6) 



j = 1 A-=l 



m = 1 r = 1 



h y* = “SZ 



" r 

P"’ik c ij + WfcfCyi + a II w ik c jm x k r y„ 



(7) 



i=l k = 1 



m = 1 r = 1 



where h r is the activation potential of allocation neuron iq and h., is the activation 

X iq r 1 y jq 

potential of the location neuron jq. 

The central property of the proposed network is that the computational energy 
function always decrease (or remains constant) as the system evolve according to its 
dynamical rule 
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x iq (k + \) 



1 if h . (k)— max{/z . (£)} 

x iq 1 <r<P x ' r 



( 8 ) 



0 otherwise 




1 if h (k)=ma\{h (k)} 

“n \<i<N y,c < 



0 otherwise 



(9) 



Note that we introduce the group-update concept or assembly-update concept, that 
is, all neurons of the same group or assembly are updated at the same time. 



4 Neural Network Algorithm 

The following procedure describes the proposed neural network algorithm (NNA): 

1. Set the initial state by randomly setting the output of one neuron in each of n+p 
groups to be one and all the others neurons in the group to be zero. 

2. Select a group g where 1 < g < n + p 

3. Compute the inputs of the neurons in the group g by expression (6) when 
1 < g < n or by expression (7) otherwise. 

4. Update neurons by expression (8) when 1 < g < n or update neurons by expres- 
sion (9) if n + 1 < g < n + p 

5. Repeat from step 2 until no more changes. 

Clearly, this procedure is very similar to the dynamics of a recurrent neural net- 
work. The network updates itself in a systematic way while neurons are forced to 
assume a feasible solution. The feasibility is guaranteed, since every network configu- 
ration is forced to be a feasible solution. Thus, the algorithm can be seen as an effi- 
cient and convenient simulation approach to solve the proposed problem. Further- 
more, the network is still implemented able in hardware, making the potential for 
rapid execution speed a further advantage. 



5 Simulation Results 

The data for this study are based on the well known CAB (Civil Aeronautics Board) 
data sets from the literature. Problems of size n=10 and 15 are extracted from this 
data set, while further problems are generated by varying the number of hubs 
p 6 {2,3,4} and the transfer cost a e {0.2, 0.4,0. 6, 0.8,1} . The values of the collection 
and distribution coefficients are fixed at P = y = 1 . Exact results are again provided 
using the linear programming approach of Ernst and Krishnamoorthy [5]. The results 
are compared to the Hopfield network (HN) and the modified hill-climbing Hopfield 
network (HCHN) provided by Smith et al. [8]. All algorithms run on an Origin 2000 
(Silicon Graphics Inc.) multiprocessor operated under IRIX 6.5 with 16 CPUs MIPS 



R1000. 
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Table 1 . List airport names of CAB database. 



Atlanta 


Miami 


Baltimore 


Minneapolis 


Boston 


New Orleans 


Chicago 


New York 


Cincinnati 


Philadelphia 


Cleveland 


Phoenix 


Dallas-Fort Worth 


Pittsburgh 


Denver 


St. Louis 


Detroit 


San Francisco 


Houston 


Seattle 


Kansas City 


Tampa 


Los Angeles 


Washington DC 


Memphis 





The results presented in Table 2 demonstrate quite clearly that the proposed NNA 
is able to compete effectively with the Hopfield neural networks approaches proposed 
by Smith et al. [8] in finding optimal or near-optimal solutions to the CAB data sets. 
The HN often converges to a poor quality solution, since it becomes caught in the first 
local minima it encounters. The HCHN considerably improves the quality of solutions 
with optimal solutions being located in 73% of the CAB problem instances. It is im- 
portant to remember that the neural network results presented in Table 2 are simula- 
tions only, and are used to provide an indication of the quality of solutions, which 
could be expected from a hardware implementation of the networks. However, the 
amount of CPU time required to simulate the proposed NNA is the main advantage 
with respect the Hopfield network approaches. Thus, the CPU time for the HN and 
HCHN simulations are several orders of magnitude greater than those for the NNA. 
Moreover, the memory requirements for these simulations (HN and HCHN) also 
make it difficult to obtain solutions for problems greater than n=20. Although, this 
problem is not encountered in a hardware implementation, where an increase in prob- 
lem size translates only to an increase in the number of amplifiers and resistors. 



6 Conclusions 

In this paper we have proposed a competitive recurrent neural network for airlines 
network design. We have considered the uncapacitated single allocation p-hub prob- 
lem. It is important to consider the model because it is the most appropriate model in 
certain situations. We have proposed a new formulation that reduce the number of 
variables and constraints of the formulations provided by several authors [2,5,7] . As 
another neural solution approach, we proposed a competitive recurrent neural model. 
Although the networks results have been simulated on a digital computer, the pro- 
posed competitive neural network require a less amount of computational resource 
than the Hopfield networks approaches proposed by Smith et al. Moreover, the CPU 
time for the Hopfield networks simulations are several orders of magnitude greater 
than those for the proposed NNA. 
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Table 2. Results of CAB data sets for NNA, HN and HCHN. 


n p 


a 


Average Error (%) 

NNA HN HCHN 




0.2 


0.08 


0.00 


0.00 




0.4 


0.05 


0.00 


0.00 


2 


0.6 


0.03 


0.00 


0.00 




0.8 


0.02 


0.00 


0.00 




1 


0.02 


6.60 


1.60 




0.2 


0.13 


18.80 


0.00 




0.4 


0.08 


15.90 


0.00 


10 3 


0.6 


0.08 


1.70 


0.00 




0.8 


0.07 


0.00 


0.00 




1 


0.05 


1.00 


0.40 




0.2 


0.24 


0.00 


0.00 




0.4 


0.10 


1.10 


0.00 


4 


0.6 


0.10 


4.80 


0.00 




0.8 


0.09 


6.90 


0.40 




1 


0.08 


0.70 


0.40 




0.2 


0.16 


0.00 


0.00 




0.4 


0.14 


0.00 


0.00 


2 


0.6 


0.09 


0.00 


0.00 




0.8 


0.08 


3.00 


0.20 




1 


0.07 


7.20 


0.50 




0.2 


0.09 


0.20 


0.00 




0.4 


0.08 


2.10 


0.00 


15 3 


0.6 


0.16 


0.60 


0.00 




0.8 


0.12 


3.50 


0.00 




1 


0.13 


1.40 


1.00 




0.2 


0.11 


0.00 


0.00 




0.4 


0.11 


4.60 


0.00 


4 


0.6 


0.14 


2.80 


0.00 




0.8 


0.15 


1.90 


0.00 




1 


0.18 


3.70 


1.20 



In the computational experience, we have shown that the proposed neural model 
worked well, only 0.1% average error for the CAB data sets, and found optimal or 
near-optimal solutions quickly. Therefore, the proposed recurrent neural network 
might be used for large problems. While other heuristics are quite fast, neural net- 
works have the potential to solve large size problems even faster by employing the 
parallel and hardware implementation for which they were design. 
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Abstract. Multimedia communication in mobile and ad hoc networks 
used by real time applications can be improved by adding intelligent and 
adaptive cababilities. This new functionality will allow them to adapt to 
contantly and unpredictably changing network conditions. Derived from 
this adaptivity, the user will perceive a more or less constant quality 
instead of the high variable quality perceived in nowadays applications. 
In this work, we maintain the following thesis: both machine learning 
and intelligent agents will play an important role in the improvement 
of the aplications we mentioned above. Machine learning, by means of 
reinforcement learning will provide adaptivity. Intelligent agents will ease 
P2P computation. This paper focuses on approaches for both topics. 



1 Introduction 

A mobile ad hoc network (MANET) is a spontaneous association of terminals 
equipped with wireless interfaces, which form a network. These networks do not 
require any infrastructure and all the network-layer functions need to be dis- 
tributed among each of the different nodes. For example, when two distant nodes 
need to communicate, intermediate nodes act as relays so that multi-hop paths 
can be created. So, ad hoc nodes perform the functions both of host and routers. 
These networks are characterized by continuously and unpredictably changing 
network conditions, mainly due to the movement of the nodes (which provokes 
topology changes), and other issues at the lower layers like fading, collisions, etc. 
Traditional real-time multimedia applications are unable to perform well over 
these networks, and some adaptive functionalities are required at the applica- 
tion layer, to deal with such problems. These new applications called ’’adaptive 
applications” are challenged with new components to detect the current network 
conditions and adapt their internal settings (e.g. audio codecs, video rates, etc.) 
accordingly. 

The main focus of traditional multimedia applications is the reduction of the 
data rate when the network bandwidth becomes scarce, and the increase of the 

* Work supported by the FIT-070000-2003-662, FIT-1603002003-41 and TIC2002- 
04021-C02-01 projects). 
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data rate whenever more resources become available. Of course, this behaviour 
improves the QoS perceived by the user. However, the relation between user- 
perceived QoS and the clata-rate required to achieve that QoS is not linear. So, 
when the network conditions become very bad, a correct change in the internal 
application settings, could greatly reduce the data rate, while keeping the QoS 
to an acceptable level. The main problem, is that for these applications to do 
that, they have to be aware of the user-perception of QoS. This modeling is very 
complex because it usually has subjective components which cannot be modeled 
analytically. 

In this work we propose an hybrid approach for the design of a mechanism in 
charge of managing the configuration of a multimeda peer-to-peer application. 
This configuration mechanism seeks for the best user satisfaction. The hybridiza- 
tion must be understood in terms of the different machine learning techniques 
[10] we use. We will first obtain an inductive model, using supervised learn- 
ing, to predict the user perceived quality given concrete network conditions and 
multimedia application settings. Following, once we are able to score a concrete 
situation, we will apply reinforcement learning [12] to learn a strategy to decide 
when and how to change application settings, taking into account the score of 
the inductive model. 

Intelligent agents [14] will be used to ease the control of the P2P multimedia 
applications. We also propose in this work the use of FIPA ( Foundation for 
Intelligent Physical Agents ) agents to wrap the elements afore mentioned and to 
seamesly integrate them in previously existing multimedia applications. 

The rest of the paper is structured as follows: section 2 explains the problem 
we are faced to. Section 3 introduced the hybrid approach we have used to obtain 
the adaptive mechanism. Following, section 4 briefly presents the agents archi- 
tecture we use to integrate the adaptivity in multimedia applications. Finally, 
section 5 expose lessons learned in this work and pending tasks. 



2 Adaptive Multimedia Applications 

Quality of Service (QoS) as defined in ITU-T recommendation E.800, ITU-E.800 
[2] is “the collective effect of service performance, which determines the degree of 
satisfaction of a user of a service” . It is characterized by a combination of service 
performance factors such as operability, accessibility, retainability and integrity. 
Thus, it is clear that the user plays an important role in QoS evaluations. 

We will start by introducing the application architecture [4]. Main build- 
ing blocks of it appear at figure 1. The main items in this architecture are the 
following: (a) multimedia application components like audio, video, slides for a 
remote presentation, etc., (b) the QoS signaling mechanism and (c) the adapta- 
tion logic. The QoS signaling mechanism is the protocol in charge of sending and 
receiving reports describing the network conditions from the other end. When 
such a report is received it is passed to the Adaptation Logic as an additional 
input. Additionally, the Adaptation Logic is in charge of deciding which set of 
parameters is best suited to the current network conditions. 
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Fig. 1. General adaptive architecture 



The most difficult part for the Application Logic is the decision on what 
components to adapt, and which setting to change, when the application is 
exceeding the available bandwidth. Many adaptive applications to date, just 
reduce data rates to use less than the available bandwidth. However, they do 
not deal with the main difficulty, which is taking the correct decision while taking 
into consideration the subjective user-perception implications about it. 

3 Hybrid Learning for P2P Control 

In this section we will introduce the scheme we have designed to perform hybrid 
learning. Section 3.1 will introduce the supervised learning part and section 3.2 
will present the learning by reinforcement part. 

3.1 Supervised Learning for User Modeling 

In order to inductively model the QoS perception of an user, we have to produce 
a learning data set. And this has to be compound by examples of situations refer- 
ring to a particular network condition and a particular multimedia application 
sending to and receiving data from the network. Network conditions have been 
reproduced by using a reflector. This is a software tool collocated in the middle 
of a dedicated link between two communicating nodes. It will be in charge of 
simulating different levels of available bandwidth and estimating packet losses. 
The multimedia application used, ISABEL-Lite, is a reduced version of ISABEL 
[1] which allows both manual and automatic change of its settings. This settings 
must be understood in terms of audio and video codecs. Audio and video codecs 
are in charge of capturing, coding, sending, decoding and presenting audio and 
video data respectively. More especifically, for the video we can also specify the 
size, the number of frames per second sent and a quality factor. 

Table 1 summarizes all attributes and the corresponding range of values, 
which compound the data set used to model the user. Notice that the last row 
refers to the score given by the user. 

The data set consists of 864 instances, each one scored by an user. The data 
set can be considered to be balanced with the following distribution of examples 
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Table 1. Parameters appearing at the example set used for rule induction by SLIPPER 



Parameter 


Values 


Explanation 


BW 


{33, . . . , 384} 


Limit of network bandwidth 


LOSS 


0..100 


% loss packets 


AUDCOD 


PCM, G711-U, G722, GSM 


Audio codec 


VIDCOD 


MJPEG, H.263 


Video codec 


FSIZE 


GIF, QCIF, 160x128 


Size of video frames 


QFVIDEO 


5, 10, 15, 30, 60 


Quantify factor of video codec 


FPS 


{0, . . . , 12} 


Frames by second sent 


QoS 


1, 2, 3, 4, 5 


User perceived quality 



by score: 241 (27.8%) examples with score 1, 83 (10.4%) for score 2, 181 (20.9%) 
examples with score 3, 233 (26.9%) with 5 and finally, 125 (14.46%) for the 
highest score. 

Learning experiments have been performed using SLIPPER [6]. This algo- 
rithm does not directly use the classic search bias of divide an conquer for rule 
induction. Instead, it bases its strategy on boosting [7]. It uses a weak learner 
(i.e. a very simple rule induction algorithm) which boost by modifying learning 
instances probability each iteration to focuse on instances not correctly classified 
yet. In fact, we also tested IREP, IREP* [8] and RIPPER [5]. Former algorithms 
which do not use boosting and all of them under-performed SLIPPER. We will 
consider the possibility of using other kinds of algorithms like, for example, ordi- 
nal regreesion ones. This algorithms predict discrete outputs taking into account 
a given order between values. 

Best classification capacity model we have obtained with SLIPPER appears 
at figure 2. Ten fold crossvalidation gave a missclassification error of 10%. It is 
compound by 12 rules and an example is classified as pertaining to the first class 
(from the upper to the lower class) in which the sum of the confidence values 
of matching rules in the class is higher than the corresponding negative value. 
Using that model we can approximate user perceived QoS. However, what we 
really need is a mechanism to decide, when thing go wrong in the session, what 
changes have to be applied to the application settings. For that, we will use 
reinforcement learning. 

3.2 Reinforcement Learning for Adaptivity 

In this section we will introduce our approach to obtain the adaptivity scheme. 
This will be in charge of deciding when and how to change the configuration 
of our multimedia application to obtain, in the long term, an optimun user 
satisfaction. The decision model we will obtain will be a multi-layer perceptron 
[3]. Learning the parameters (i.e. the weights of the arcs in the network) is done 
by reinforcement learning [13]. 

In learning by reinforcement, we make use of an entity called agent [12] which 
is situated in the environment. Its situation comes depicted by the environment 
particular state at time t, let it be denoted with s t . The learner agent can perform 
a set of actions in the environment. Each time the agent executes an action a, it 
receives a reward from the environment, r. The agent then has to appropiately 
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if matchConf idence { 

[QFVIDEO >= 60, VIDCOD = M JPEG , FSIZE = QCIF, LOSS <= 10, FPS >= 6] -> 2.8792 
[AUDCOD = GSM, BW >= 80, QFVIDEO >= 30, FSIZE = QCIF, FPS <= 6] -> 1.4357 

[AUDCOD = GSM, BW >= 128, LOSS = 0, QFVIDEO >= 30, FPS >= 3, VIDCOD = MJPEG] 

-> 1.7013 
[] -> -2.4188 

} > 0 then 5 else if matchConf idence { 

[BW >= 384, QFVIDEO >= 40, FSIZE <= 2] -> 2.7121 

[QFVIDEO >= 30, VIDCOD = MJPEG, LOSS <= 3, AUDCOD = G722] -> 1.1756 

[FSIZE = CIF , QFVIDEO >= 30, LOSS <= 3, AUDCOD = G722, BW >= 80] -> 1.4437 

[] -> -1.5044 

} > 0 then 4 else if matchConf idence { 

[LOSS >= 30] -> 2.1188 
[QFVIDEO <= 5] -> 1.4142 
[LOSS >= 16, FPS <= 3] -> 1.5438 
[] -> -1.0984207275826066 
} > 0 then 1 else if matchConf idence { 

[LOSS >= 16] -> 1.9109 

[QFVIDEO <= 10, FSIZE = QCIF] -> 1.5861 

[FSIZE = 160X128, QFVIDEO <= 40, 

VIDCOD = H. 263] -> 1.2546 
[] -> -0.3953 
y > 0 then 2 else 3 



Fig. 2. Rule model to estimate user perceived QoS 



choose each action it executes in order to maximice the reward obtained at the 
long term. In the context of this particular application, the agent will learn a 
state-value function, let it be denoted with V' K (st). This function will be used 
to predict the long term reward the agent would obtain if, being at time t, it 
selects the action given by the policy n (i.e. the criteria used to select an action 
among all the possible ones). This approach is typically used for prediction but 
we will use it for control. In typical control problems, not only the state is taken 
into account in the value function but also the actions. This time, the agent has 
to learn a good aproximation of a function, let it be denoted with Q 7T (s,a) for 
the current policy ir and for all states s and actions a. 

Learning is done by iteratively updating the Q function by means of the 
following expression: 

Q(s , o) <— Q(s, a) + a[r + 7 Q(s', a) - Q(s , a)], 

where the pair (s', a') refers to the state s' to which the environment goes to, 
from s when action a is executed and a 1 is the action executed at state s'. 
Constants a and 7 are the learning rate and the discount factor, respectively. 

In this particular domain, we directly act on the problem. For example, a 
possible action could be to set the video codec to MJPEG or either to change 
video size to QCIF or even do nothing at all. Inmediate rewards, obtained from 
executed actions, will be approximated by using the rules model of figure 2. 

The estimator will be learnt by using SARSA, without using the elegibilities 
mechanism (i.e. A = 0, see [12], pag. 163). Elegibilities speed up convergence to 
a good aproximation of the Q function, however, in previous simulations we did 
not perceive any improvement in using that technique. 

A world state will be given by concrete network conditions as simulated by 
the reflector (i.e. packet losses), and settings of the multimedia application. Con- 
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change 



Estimation of long term QoS 
perceived by the user 



Fig. 3. Functional diagram for the adaptive strategy to be learnt by SARSA 




Fig. 4. Bandwidth evolution used in SARSA trials 



sequently, an action will be a change in the state (except by packet losses, which 
would be given by the reflector). It must be noticed that available bandwidth 
can not be considered as another dimension of the environment features vector 
as this parameter can not be obtained from the real application. It can only be 
simulated with the reflector. A functional scheme, in terms of modules inputs 
and outputs appears depicted at figure 3. 

Again, we have used the ISABEL Light Videophone along with the reflector 
to reproduce a real application and network conditions the agent will use to learn 
from. The videophone simulates both communication end points. It reproduces, 
with no end, a musical clip with a total of 400 video frames which sends to itself 
through the reflector. This, in turn, is in charge of simulating the network link 
between the end points. Each one of the learning trials or episodes uses the same 
bandwidth values changing through time. Bandwidth follows the curve appearing 
at figure 4. Notice that the density of different bandwidth values grows while 
approximating to values between 256 and 0KB. That is because this range of 
values are more sensitive to changes. Depending on the amount of data injected 
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to the network by the videophone and on the reflector simulated bandwidth, the 
reflector will generate a concrete packet losses percentage. Both the videophone 
and the reflector accept commands from outside through a socket. By openning 
telnet connections to the videophone we can configurate application settings. For 
example, with the string set (AUDIO: : PCM, VIDEO: :MJPEG,QCIF,8,5.0) we tell 
the videophone to set the audio codec to PCM, the video codec to MJPEG, the 
video size to QCIF, the frame rate to 8 and the quality factor to 5. Bandwidth can 
be set at the reflector in the same way. We can also read packet losses from the 
reflector by using a read command with the same socket. Communication through 
sockets is important here because the learning scheme has been developed by 
using Java and the reflector and videophone are developed in C++. The learning 
global scheme appears at figure 5. 



Videophone 

multimedia data 

+>> 

<r+ 

reflected 

. multimedia data 

make 

configuration 
changes 

configure 
bandwidth 



packet losses 



Reinforcement 

Learning 

Algorithm 




Reflector 




Fig. 5. Arquitectura de aprendizaje global con videofono, reflector y algoritmo SARSA 



We have obtained the best results with a multilayer perceptron of 30 hidden 
nodes, a reinforcement learning rate of 0.05 and a discount factor of 0.9, being 
the learning rate for the neural network of 0.01. Each one of the episodes was 
compound by 970 movements (10 actions for each different bandwidth value 
appearing at figure 4). 

Effective learning progress will be demonstrated by using a number of differ- 
ent curves. The first one corresponds to the accumulated r values obtained in 
each episode, for all s t+ i visited states. It is labeled with (a) at figure 6. Notice 
that, from the very begining, it grows until it becomes stable. Another inter- 
esting point is shown by (b) curve. It refers to packet losses. In the begining, 
it shows a minimun level of packet being lost in the network. This is because 
the learner stands at very conservative states (i.e. states using a low bandwidth 
and, subsequently, low user scores are obtained). As long as the learning process 
evolves, packet losses ratio becomes stable between a 3 and a 5% (an acceptable 
value). RMSE decreases, as expected. Regarding the five curves appearing at 
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Fig. 6. Curve labeled with (a) shows the evolution of accumulated rewards through 
episodes. The one labeled with (b) represents the packet losses percentage. The (c) 
curve is the evolution of the Rooted Mean Squared Error (RMSE) for the multilayer 
perceptron which approximates the estate-valor function. At (d) we have the number 
of each different scores obtained during learning 



graph (d), we can se how that curves which represent low scores (1 and 2 scores) 
decrease. Also, the number of actions with good quality increase (i.e. 3 scores) 
and the amount of good and very good quality actions increase. 

4 P2P Control Implementation with FIPA Agents 

In this section we will explain an initial implementation we have developed for the 
adaptive control of the videophone. Our long term goal is to built a complete 
ambient intelligence [9] system. This concept emphasize the context in which 
the user is situated. This context depends on the device through which he is 
connected to the system, the physical network, his personal interests (i.e. his 
user profile) and available services at the current context. Now, we are working 
on the first and second mentioned factors. 

If we want to provide an implementation of the adaptive level of the archi- 
tecture (see figure 1) based on intelligent agents, we have to revise the QoS basic 
signalling mechanism along with the adaption logic (a detailed analysis of the 
issue can be found at [11]). 
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The QoS signalling mechanism provides with information of the network 
state to the other end point at the communication channel. A point to point 
transport mechanism can be defined in charge of informing the transmitter about 
the quality the receiver is obtaining. This QoS can be compound by the packet 
losses ratio and the mean jitter. Each one of the QoS signalling packets can be 
added a sequence number and the estimated bandwidth. We can express this 
kind of packets with XML, like in the following example: 

<?xml version="l .0" encoding="UTF-8"?> 

<qosreport> 

<sequence>34</sequence> 

<lostpackets>9 . 3465</lostpackets> 

<delay>0 . 093</ delay> 

<pref erences></pref erences> 

<estimatedbw>128000</estimatedbw> 

</qosreport> 



in which we have included an empty preferences part. Agents simply have to 
exchange messages like that, by using a performative to convey them. 

To this moment, implementation is being carried out on laptops, by using 
videophones coded in C++ and intelligent agents with the JADE platform. 

Adaption logic is in charge of deciding when and how to act on application 
settings. This functionality is given by the multilayer perceptron that, for each 
action, produces an estimate of how good it will be in the long term. The decision 
mechanism consists on choosing the action with highest return value. When 
an inform message is received with a <qosreport> content, we use the neural 
network and apply changes. 

5 Conclusions 

In this work, we have presented an hybrid approach based on both supervised 
and reinforcement learning. This has been used to obtain and adaptation mech- 
anism to maintain an acceptable QoS in the context of multimedia applications 
like a videophone. We also outlined initial details of the FIPA agents based archi- 
tecture to provide a complete ambient intelligence application. Results still can 
be improved. A possible improvement is that of using ordinal regression models 
instead of classification ones to approximate the quality perceived by the user. 
In this way, error estimations would be more precisse as score labels are ordered. 
However, and with no doubt, this work suposses a very promising start point 
with respect to the role that artificial intelligence will play in the improvement 
of ad hoc networks communication. Moreover, another posibility is using simple 
regression for the same problem. 
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Abstract. In this work, we present an approach to simplify Knowledge Acquisi- 
tion Processes (KAPs) by means of extracting knowledge directly from natural 
language texts. The ultimate goal is to acquire knowledge straight from experts’ 
language. This approach uses a morphologic analyzer to improve the setting-in- 
a-context between knowledge elements (e.g., concepts and attributes). Another 
objective is achieving language independency. Here, the knowledge acquired 
from texts is represented by means of ontologies. 



1 Introduction 

Extracting knowledge directly from natural language text is a challenging task. It 
would allow extracting knowledge easily and, what is more significant, without the 
intervention of knowledge engineers. Our ultimate goal is the development of tools 
capable of extracting knowledge from text, on the one hand, and interacting directly 
with experts of any application domain, on the other. To do this, we agree with [3] in 
the sense that people who know a language should in part know the rules of such 
language. In particular, we account for this assumption in designing and implement- 
ing a morphologic analyzer. This paper presents a technique for generating knowl- 
edge from text through the combination of knowledge modelling techniques and natu- 
ral language processing, two disciplines that have been following different roads. The 
main idea behind the approach presented here is simple: the system stores the knowl- 
edge found by the expert to automatically identify this knowledge whenever it reap- 
pears. Knowledge has been represented in this work by means of ontologies. In litera- 
ture, ontologies are commonly defined as specifications of domain knowledge 
conceptualisations [15j. Due to the nature of an ontology, there is not a unique (valid) 
manner for defining ontologies [8]. Moreover, several definitions have historically 
been given to the term ontology, although an ontology is commonly considered to be 
an enumeration of the relevant concepts in an application area, as well as a definition 
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of classes of concepts and relationships among these classes [4]. A series of functions 
to capture knowledge has been implemented to represent the knowledge acquired 
through ontologies. 

The structure of the paper can be described as follows. Section 2 presents an over- 
view of the approach presented in this work. Section 3 and 4 account for the algo- 
rithms used for parsing natural language texts, indicating also how to set words into 
their corresponding context. In Section 5, an example of the application of the frame- 
work is described. Section 6 explains the system implementation and its application to 
a real domain. Related work is discussed in Section 7. Finally, in Section 8 some final 
conclusions are remarked. 



2 Overview of the Approach 

The aim of this work was to implement a system able to extract knowledge from natu- 
ral language texts. More precisely, we have focused on building an ontology from a 
text. Ontologies permit to divide knowledge into categories such as concepts, attrib- 
utes, relationships, rules, axioms, etc. These knowledge entities can appear explicitly 
in the text, although sometimes knowledge is only referred to implicitly. Thus, the 
approach attempts to find explicit knowledge entities from the text. The starting point 
is an empty knowledge base. In this phase, the system is unable to find knowledge in 
the text and the expert has to introduce knowledge manually. However, experts do not 
just find knowledge in a single fragment, but they also identify expressions which that 
knowledge can be derived from. The expert identifies all the knowledge entities of the 
fragment and (s)he also tells the system the expressions in which they appear. These 
expression-knowledge associations are stored by the system in order to be reused for 
new knowledge findings thereafter. The expert has only to identify these associations 
once, and from that moment onwards, the system will perform automatically, and the 
expert’s task will just consist of confirming the results output by the system. In prin- 
ciple, we might think that in a system with a huge knowledge base, the expert has just 
to take a seat, divide the text into minor fragments and confirm the system’s propos- 
als. Unfortunately, the process is not that simple. The system checks the fragments for 
expressions with already associated knowledge. A word with associated knowledge 
may appear in plural or singular, replaced by a pronoun, etc and verbs may appear in 
different inflected forms (number, tenses, etc.). 

In a huge knowledge base, it is likely to find expressions with multiple knowledge 
associations, as words can have various meanings. These meanings refer to other 
knowledge pieces in the knowledge base. For instance, an attribute does not exist on 
its own, it refers to a conceptual entity. A relationship implies the existence of at least 
two knowledge entities. Thus, the system has to identify knowledge in fragments as 
well as knowledge referenced by it. The process reveals some problems: (1) searching 
for meaningful expressions in a text; (2) deciding what to do when an expression has 
more than one knowledge association in the knowledge base; and (3) identifying 
knowledge referred to by “non-concepts”. The first two problems are faced here in the 
search phase whereas the third one is dealt with in the setting-in-a-context phase, for 
which the system uses a morphologic analyser. The morphological analyzer uses the 
learning algorithm C4.5 [10] to classify each word of a sentence. For each word, the 
instance related to it will be obtained, and this word will be classified by C4.5. Fur- 
ther details about this analyzer may be found in [12]. 
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3 Parsing Text and Looking for Knowledge 

The starting point for using the approach is a set of expressions in the current frag- 
ment with no associated knowledge. The system completes a full cycle once all words 
of the current fragment have been analyzed. Next, it takes the remaining non-analyzed 
words of the text fragment (current words) and looks for similar words in the already 
existing expressions in the knowledge base. Then, for each expression of the knowl- 
edge base similar to the current word, if it is considered to be an acceptable expres- 
sion, these actions are performed: (1) obtain and sort the knowledge associated to the 
expression present in the knowledge base; (2) create a new expression that matches 
the knowledge base expression and associates previously sorted associated knowledge 
to than expression; and (3) add the new expression to the list of fragment expressions 
with its associated knowledge. 

When no good options are found, the user has to be provided with the possibility of 
defining new knowledge associated to the expression. Alternatively, these expressions 
might also be straightforwardly ignored (e.g., preposition, particle, conjunction, inter- 
jection, pronoun, and determiner). The similar function is in charge of identifying 
which expressions of the knowledge base are similar to the current word of the frag- 
ment. In its simplest case, it would be an “equal” function. Nevertheless, this function 
cannot deal with compound expressions as such; therefore a new function is needed, 
namely “isPrefix”, which function checks whether the current word is a substring of 
another word or not. It would also be desirable that the function could deal with word 
families (types associated to a single lemma/lexeme) and other language peculiarities. 
For instance, if the expression “causes” already exists in the knowledge base and the 
current fragment contains the word “caused”, it would be desirable that the system 
realized that both words actually allude to the same verb (lemma). This issue might be 
partially implemented using parts-of- speech taggers and lemmatisers. Here, a word in 
the current fragment is “similar” to an expression in the knowledge base if the expres- 
sion starts with the current word. 

The acceptable function is an extension to the “similar” one. It is introduced to de- 
termining whether the current word and a similar expression are not just “similar by 
chance”. The “isPrefix” function has an important drawback: if the current word is the 
article “a”, any expression starting with “a”, as “assurance”, “added value”, “a hun- 
dred” or “advert” will be (candidates to be) considered as similar. Therefore, this 
function limits the number of acceptable options amongst the similar ones. This func- 
tion has been designed with strong requirements: an existing expression in the data- 
base is acceptable if it appears as such in the current fragment. 

Current words in a text fragment are always single constituents. However, database 
expressions can contain more than one word (multiple-word expressions). If a word is 
acceptable, then the current fragment will contain all the words of the database ex- 
pression. Thus, the current word needs to be enlarged to cover all the words of the 
database expression, creating a new object that contains all the words. 

The correctly recognized relationships between expressions and their associated 
knowledge are stored in the database. Once an expression is obtained holding certain 
properties (through the functions similar and acceptable), knowledge associated with 
that expression is searched for in the database. Whenever different association possi- 
bilities in the database exist, the system sorts them out and displays them. 



An Approach for Ontology Building from Text Supported by NLP Techniques 129 



Whenever different possibilities are considered as inferred knowledge from an ex- 
pression, the system rearranges and sorts them according to the following criteria: (1) 
person-dependency, who recognized the knowledge, (2) domain dependency, the type 
of domain and (3) spatial location, whether the expression belongs to the same frag- 
ment and/or text. In particular, there are currently 1 1 different possible sorting crite- 
ria. Once knowledge sorting has concluded, the search phase ends. At this point, the 
system is likely to have processed both the current fragment and the set of expressions 
present in its database. Additionally, inferred knowledge would have been sorted out 
according to the above criteria in an attempt to overcome ambiguity. 



4 The Setting-in-a-Context Phase 

Once the search phase has been performed, the system is fitted with a list of associ- 
ated knowledge expressions. However, the system’s task does not finish then, unless 
the inferred knowledge is a concept; if the inferred knowledge is a different knowl- 
edge entity (i.e., attribute, value, relation) some operations need to be performed. In 
what follows, we shall explain the operations that need to be performed for different 
knowledge entities. In other works, the conceptualisation was very limited and was 
dependent on a concrete language, namely, English, where attributes usually follow 
concepts (see for instance [13]). This property was used by a system to look for con- 
cepts which attributes belong to. So, when the system finds an attribute in the search 
phase, the system searches for the most left-nearby concept in the current fragment. 
However, that is not always correct. For example, in the following fragment: "... due 
to the weight of the table”, the concept table is on the right of its attribute weight. 
Therefore, with the previous process, the conceptualisation would be incorrect. In 
order to solve conceptualisation problems, we use grammar patterns, which are own 
by each language and that indicate a relation between words by knowing only their 
grammar category. Thus, by using these patterns we can approach this process to be 
language independent. The grammar patterns used for English in this work, which are 
shown in Table 1, are based in the ones presented in [14]. 

We say “property” because the existing relation between two words is a priori un- 
known (e.g A word is an attribute of a concept, or a value of an attribute). Once the 
search phase has finished, the system will perhaps have several attributes, concepts, 
and values. When the system has to find the relations between those knowledge enti- 
ties, it use of such patterns. For example, in the following fragment: “... the red car 
...”, if the system has tagged red as value and car as concept in the search phase, by 
using the pattern ‘Adj + Noun’ the system will find a relation between the concept car 
and the value red. All relations are assumed to be binary. That is, two elements need 
to be found. Let us consider this fragment now: “...antioxidants inhibit the activation 
of the NF-kB transcription factor... “. This type of structure is quite frequent when 
identifying participants in relations: one of the candidates is on the left hand-side of 
the expression, inferring the relation, and the other one on the right hand-side. The 
system searches for expressions with inferred knowledge on the left and right hand- 
side, and candidates are selected according to various criteria: 
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• If the current expression is associated to a relation of the type “is-a” or “part-of 
any ontological category can be chosen as a candidate as these relations can only 
exist between concepts. Therefore, the system searches for two concepts, one on 
the left and another on the right hand-side of the current expression. It is very rare 
that any of the candidates of a relation is a value (the system is designed to ignore 
values). 

• If an attribute is found, the process of searching for a related concept is the same as 
the one described above to provide a context for attributes. 

The search process is similar to the one described in previous sections. Candidates are 
searched (1) in a pre-determined number of expressions for which the user has associ- 
ated knowledge, (2) in the expressions obtained in the search phase and, finally (3) in 
the user expressions. 



Table 1. Grammar patterns. 



Previous 

word(s) 


Current word 


Relation 


Example 


Adjective 


Adjective 


The previous word is a property 
of the current one 


Sweetie lovely 
Diffusible binding 


Adverb 


Adverb 


The previous word is a property 
of the current one 


Very popular 




Adverb 


The previous word is a property 
of the current one 


Very strongly 


Adjective 




The previous word is a property 
of the current one 


Tall boy 
Individual gene 


Noun 


Noun 


The previous word is a property 
of the current one 


Telephone directory 
Gen activity 


Noun+prep+ 

(det) 




The current Noun is a property 
of the first one 


The table of wood 
Model of the molecule 



5 Example of Knowledge Acquisition Process 

In this section, we introduce an example to describe the operation of the knowledge 
acquisition process. Hence, we will suppose that we have the knowledge base shown 
in Table 2. and that we are processing the following text fragment: “A tight mutant is 
a mutant which displays its non-wild type phenotype distinctly and clearly while a 
leaky mutant displays a much less distinct phenotype compare to wild type”. First, we 
will show the results of the morphological analysis, and then, we will illustrate how 
knowledge is found in this fragment. 



Table 2. Knowledge Base. 



Is 


Taxonomic Relation 


phenotype 


Concept 


is a 


Taxonomic Relation 




is a class of 


Taxonomic Relation 


tight 


Value 


is a part of 


Mereological Relation 


tight Mutant 


Concept 




type 


Attribute 


leaky Mutant 


Concept 




Non-wild 


Value 
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Morphological Analysis 

A (Determiner) tight (Adjective) mutant (Noun) is (Verb) a (Determiner) mutant 
(Noun) which (Pronoun) displays (Verb) its (Determiner) non-wild (Adjective) type 
(Noun) phenotype (Noun) distinctly (Adverb) and (Conjunction) clearly (Adverb) 
while (Conjunction) a (Determiner) leaky (Adjective) mutant (Noun) displays (Verb) 
a (Preposition) much (Adverb) less (Adverb) distinct (Adjective) phenotype (Noun) 
compared (Verb) to (Preposition) wild (Adjective) type (Noun). 

Parsing Text and Looking for Knowledge 

As it was previously stated, the words classified as preposition, particle, conjunction, 
interjection, pronoun, and determiner are considered to be semantically meaningless. 
So, the system does not search any knowledge associated to them. The system 
searches first knowledge associated to tight. Thus, the system searches in the knowl- 
edge base for similar expressions. In this case, the similar expressions to tight are 
{tight, tight mutant}, since they contain “tight” as a prefix. Then, the system searches 
for the most acceptable expression. In this case it finds the expression “tight mutant” 
in the text, so it infers that “tight mutant” is a concept. The next word to be analyzed 
is “is”. The similar words in the knowledge base are {is, is a, is a class of, is a part 
of}. However, the system realizes that “is a” is acceptable. Thus, the system infers a 
taxonomic relation associated to the expression “is a”. The process continues and the 
system will finally obtain the following knowledge from the text: 

{tight mutant (Concept), is a (Taxonomic relation), mutant (Concept), non-wild 

(Value), type (Attribute), phenotype (Concept), leaky mutant (Concept) } 

Setting-in-a-Context Phase 

At this point, the system does not need to perform any more operations with the ex- 
pressions whose associated knowledge entity is a concept. Otherwise, the correspond- 
ing knowledge entity needs to be conceptualised. In this example, the expressions 
“non-wild”, “type” and “is a” have to be set into the correct context. Now, the system 
makes use of the grammar patterns to relate the knowledge. For instance, the expres- 
sion “non-wild” was labeled as a Value in the previous phase. Following with the 
example, the system will need to find its related attribute. The expression “non-wild” 
is an adjective and the expression “type” is a Noun. According to the fourth grammar 
pattern, “non-wild” is a property of “type”. Hence, the system infers that “non-wild” 
is a value of the attribute “type”. Next, the system has to associate the attribute “type” 
to a concept. Since “type” is a noun and “phenotype” is a noun, a grammar pattern is 
found. The system knows that “type” is a property of “phenotype”. By looking at the 
previous phase, “phenotype” is a concept and “type” is an attribute, so the system 
infers that “type” is an attribute of the concept “phenotype”. The last expression to be 
conceptualized is “is a”, which represents a taxonomic relation. The system has to 
find the participants of the relationship. For this purpose, the system searches on the 
left and the right side of the linguistic expression “is a” to find the participants of the 
relationship. Here, the participants have to be concepts. In this example, the system 
infers a taxonomic relation between the expressions “tigh mutant” and “mutant”. 
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6 Implementing a Software Tool 

A software tool based on the approach described above has been designed and im- 
plemented for acquiring knowledge from texts (text needs to be specified in a text 
file; i.e., in ASCII format). The tool is fitted with two distinct working modes: (1) the 
query mode and (2) the maintenance mode. In the maintenance mode, users are pro- 
vided with the full functionality of the tool (adding new experts and tasks, associating 
experts to tasks; saving the work/session(s) in the database, loading previously saved 
work, etc). The query mode has a reduced functionality. The user can neither perform 
management activities nor save work/sessions in the database. Other differences be- 
tween both modes are: (a) in the maintenance mode, the user inserts knowledge with 
the help of the tool; the system proposes knowledge to the user by making use of 
natural language recognition techniques; and (b) in the query mode, the user cannot 
insert new knowledge as ontologies are built automatically. Non-expert users cannot 
carry out the following actions: input knowledge into the system, select the expert, 
select the task, select a text for recognition. In our tool, five ontological knowledge 
categories are used, namely: 

• Concepts: these represent a class of objects in the domain. 

• Attributes: these represent the properties of a given concept. 

• Values: attributes belong to a domain; attributes such as length are numeric 
whereas attributes like colour are enumerated. The elements of those domains are 
the possible values attributes can take. 

• Relations: relations in a domain ontology play the same role as in a relation/entity 
model, although some constraints have been imposed. In this tool, relations are bi- 
nary and pre-defined: IS-A, PART-OF, ASSOCIATION, INFLUENCE. 

• Axioms: an axiom is a domain rule. For instance, Force = mass * acceleration. 

The taxonomic and mereological relations do only exist between two concepts. The 
remaining relations can exist between whatever two ontological categories, although a 
relation cannot be part of another relation. 

The system may recognize entities from such categories except for axioms. The 
structure of the ontologies resulting of our system performance can be seen in Fig- 
ure 1 . The tree on the left hand-side of Figure 1 is the ontology, having three main 
branches: concepts, relations, and rules (i.e., axioms). Axioms appear as branches of 
the “rules” node. Each concept has branches for allocating its attributes and each of 
these has for its values. The relations are represented as branches of the “relation- 
ships” node, and the instances of the relations can be viewed on the right side of the 
screen (i.e., the IS-A relation in Figure 1). 

In order to evaluate the usefulness of the approach in real settings, a case study 
(experiment) was performed. It consisted in applying it to several sub-domains of 
Computer Science with ‘simulated experts’, namely, 5 th year students instructed for 
the experiment (one expert per sub-domain). The instruction was done through the 
provision of abundant information concerning the sub-domain they were encouraged 
to work in. Concerning motivation, a list containing descriptions of each sub-domain 
(already well-known by them through the corresponding subjects studied in the ca- 
reer) utilised in the case study was first shown to them. Then, they selected those 
found to be most ‘attractive’ to them. With this, we tried to ensure each ‘expert’ was 
motivated enough to do his/her job in the experiment. With all, each expert was given 
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Fig. 1. Analysis of a text fragment. 

a text from the domain which (s)he had been instructed on. Then, we checked whether 
the assumptions pointed out before were too strong or not. The results of the knowl- 
edge acquisition process from text in this experiment show that, at least, the simulated 
experts used in this case study overcame the technical, implicit restrictions of our 
approach and extracted and represented explicit knowledge from text, as it was our 
goal. The data of the experiment and the resulting ontologies can be accessed at our 
web page. 



7 Related Work 

The way we approach knowledge structuring differs from the one presented in [5,1 1]: 
in that our knowledge entities are concepts, attributes, values, relations, and rules 
whereas in the quoted work, the discussion is about concepts, roles, individuals and 
axioms. Another difference with [5,11] is that the concept acquisition process is per- 
formed differently, too: our system’s suggestions are hypotheses the user accepts or 
rejects, whereas in [8], the process is structured into three phases: (1) generating qual- 
ity labels for hypotheses; (2) estimating the credibility of the hypotheses; and (3) 
computing the order of preference of the hypotheses. 

The expression-oriented analysis to capture knowledge from text in the system pre- 
sented here is somewhat more general than the classic word-based approach described 
in [3], for whom words can be derived from other words by means of transformation 
rules. Semantics associated to terms has been dealt with also elsewhere. In particular, 
in [6] the author recognises that semantic variations permit to recognise, for example, 
verbal and adjectival phrases as conceptually equivalent to nominal terms. Concern- 
ing tools for terms acquisition from text, there are others well-known in literature, for 
instance LEXTER [2], which was built for term acquisition from French corpora. In 
our work we go beyond the term extraction to distinguish several kinds of semantic 
terms through several ontological knowledge categories. The use of ontologies for 
knowledge acquisition from text is discouraged in [7,9] for domains in which changes 
in expert knowledge is rapid and substantial. However, we believe to have shown that 
our approach can easily be adapted to new requirements. 
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8 Discussion and Conclusion 

In this paper, an approach that combines knowledge acquisition and natural language 
recognition techniques has been used for implementing a system capable of extracting 
knowledge from natural language texts in a supervised mode. The methodology pre- 
sented in this work offers a new and promising method for knowledge acquisition 
from text. The system has been evaluated in one case study (i.e., a Computer Science 
domain) and several ontologies corresponding each to a different sub-domain have 
been built by applying the framework described in this paper to a set of natural lan- 
guage texts on the referenced domain. We are confident that this approach of acquir- 
ing knowledge from text offers some advantages with respect to pure linguistic meth- 
ods such as: (1) ambiguity is taken into account (i.e., person dependency, spatial 
location, domain dependency); (2) rhetoric is not considered; (3) implicit knowledge 
can be identified and added by the user; (4) the system is incremental and automatic; 
and (5) the system’s performance and transparency are acceptable. 

The way in which the acquisition process has been divided into (i.e., ‘search’ and 
‘setting-in-a-context’ phases) allows, in principle, the system to be used for any lan- 
guage. However, some considerations need to be made regarding these two phases. 
We propose some improvements concerning the context-in-a-setting phase regarding 
different knowledge entities: (a) relations (i.e., solving situations such as those in 
which no participants appear on either side of the relation; a possible solution might 
be checking whether the concept is directly followed by an attribute; here, we might 
think that it is more likely that the attribute is the second participant and not the con- 
cept); (b) pronouns (these are not dealt with in this work and would be another inter- 
esting issue to be treated). Regarding future work, we plan to perform a statistical 
evaluation of the system in some other domains such as medicine, where we have 
already performed some promising experiments. 



Acknowledgements 

We thank the Spanish Ministry for Science and Technology for its support for the 
development of the system through projects TIC2002-03879, FIT-110100-2002-78, 
FIT- 150500-2002-376, FIT- 150500-2003-499, FIT-150500-2003-503, FIT-110100- 
2003-73, FIT-150500-2003-505, and Murcian Regional Government through project 
2I03SIU0039. We also thank the European Commission for its support under project 
ALFA II0092FA. 



References 

1. M. Aronoff, Word Formation in Generative Grammar, MIT Press, 1976. 

2. D. Bourigault, LEXTER, a Natural Language tool for terminology extraction. In Proceed- 
ings, 7 th EURALEX International Congress, 771-779, Goteborg, Sweden, 1996. 

3. N. Chomsky, Knowledge of Language: Its Nature, Origin, and Use, Praeger, 1986. 

4. J.T. Fernandez-Breis, D. Castellanos-Nieves, R. Valencia-Garcia, P.J. Vivancos-Vicente, R. 
Martlnez-Bejar, and M. De las Heras-Gonzalez, Towards Scott domains-based topological 
ontology models. An application to a cancer domain, in Proceedings of International Con- 
ference on Formal Ontology in Information Systems. Maine, EEUU, 2001. 



An Approach for Ontology Building from Text Supported by NLP Techniques 135 



5. U. Hahn, & K. Schnattinger, An Empirical Evaluation of a System for Text Knowledge Ac- 
quisition. In Proceedings of the European Knowledge Acquisition Workshop, 129-144, Sant 
Feliu de Guixols, Spain, 1997. 

6. C. Jacquemin, Spotting and Discovering Terms through Natural Language Processing, MIT 
Press, 2001. 

7. D.M. Jones & R.C. Paton, Acquisition of Conceptual Structure in Scientific Theory. In 
E.Plaza & R. Benjamins (Eds), Proceedings of the European Knowledge Acquisition Work- 
shop, 145-158, Sant Feliu de Guixols, Spain, 1997. 

8. M. A. Musen, ‘Domain Ontologies in Software Engineering: Use of Protege with the EON 
Architecture', Methods of Information in Medicine, 37, 540-550, (1998). 

9. D.E. O'Leary, ‘Impediments in the use of explicit ontologies for KBs development.’. Inter- 
national Journal of Human-Computer Studies,46, 327-338, (1997). 

10. J.R. Quinlan, C4.5: programs for Machine Learning, San Mateo : Morgan Kaufmann, 1993. 

11. M. Romacker, and U. Hahn, Context-based Ambiguity Management for Natural Language 
Processing, Lecture Notes in Artificial Intelligence 2116, 184-197, (2001). 

12. J.M. Ruiz-Sanchez, R. Valencia-Garcla, J.T. Fernandez-Breis, R. Martlnez-Bejar, R. and P. 
Compton, An approach for incremental knowledge acquisition from text. Expert Systems 
with Applications, 25(2):77-86, (2003). 

13. R.I., Sanchez-Carreno, J.T. Fernandez-Breis, R. Martlnez-Bejar & P. Cantos-Gomez, ‘An 
ontology-based approach to knowledge acquisition from text’, Cuadernos de Filologla In- 
glesa ,9(1) ,191-212, (2000). 

14. L. Thomas, Beginning Syntax, Oxford Blackwell, 1993. 

15. G. Van Heijst, A. T. Schreiber, & B. J. Wielinga, ‘Using explicit ontologies in KBS devel- 
opment’. International Journal of Human-Computer Studies, 45, 183-292, (1997). 



